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FORMPTO-1390 US DEPARTMENT OF COMNfeRCE PATENT AND TRADEMARK OFFICE 

(REV. 11-2000) 

TRANSMITTAL LETTER TO THE UNITED STATES 
DESIGNATED/ELECTED OFFICE (DO/EO/US) 
CONCERNING A FILING UNDER 35 U.S.C. 371 


ATTORNEY 'S DOCKET NUMBER 

R-341894 


U.S. APPLICATION NO. (If known, see 37 CFR 1.5 

09/83090? 


INTERNATIONAL APPLICATION NO. INTERNATIONAL FILING DATE 
PCT/FROO/02433 4 September 2000 


PRIORITY DATE CLAIMED 
3 September 1999 


TITLE OF INVENTION CLONING, EXPRESSION AND CHARACTERIZATION OF THE SPG4 GENE RESPONSIBLE FOR THE 
MOST COMMON FORM OF AUTOSOMAL DOMINANT SPASTIC PARAPLEGIA 


APPLICANT(S) FOR DO/EO/US TT . , . T ., „ 

Jean Wexssenbach, Jamxle Hazan 



Applicant herewith submits to the United States Designated/Elected Office (DO/EO/US) the following items and other information: 

1. JjOZf This is a FIRST submission of items concerning a filing under 35 U.S.C. 371. 

2. □ This is a SECOND or SUBSEQUENT submission of items concerning a filing under 35 U.S.C. 371. 

3 . □ This is an express request to begin national examination procedures (35 U.S.C. 371(f)). The submission must include 

items (5), (6), (9) and (2 1) indicated below. 
4 >K The US has been elected by the expiration of 19 months from the priority date (Article 31). 
5 -£lS A C0 Py of the International Application as filed (35 U.S.C. 371(c)(2)) 

a. J5<J is attached hereto (required only if not communicated by the International Bureau). 

b. 0 has been communicated by the International Bureau. 

c. Q is not required, as the application was filed in the United States Receiving Office (RO/US). 

6. jjj^ An English language translation of the International Application as filed (35 U.S.C. 371(c)(2)). 

a. J5<3 is attached hereto. 

b. Q has been previously submitted under 35 U.S.C. 154(d)(4). 

7. □ Amendments to the claims of the International Aplication under PCT Article 19 (35 U.S.C. 371(c)(3)) 

a. Q are attached hereto (required only if not communicated by the International Bureau). 

b. Q have been communicated by the International Bureau. 

c. Q have not been made; however, the time limit for making such amendments has NOT expired. 

d. n have not been made and will not be made. 

8. Q An English language translation of the amendments to the claims under PCT Article 19 (35 U.S.C. 371 (c)(3)). 

An oath or declaration of the inventor(s) (35 U.S.C. 371(c)(4)). 

"i0- O An English lanugage translation of the annexes of the International Preliminary Examination Report under PCT 
^\ Article 36 (35 U.S.C. 371(c)(5)). 

Items 11 to 20 below concern document(s) or information included: 

An Information Disclosure Statement under 37 CFR 1.97 and 1.98. 

An assignment document for recording. A separate cover sheet in compliance with 37 CFR 3.28 and 3.3 1 is included. 
A FIRST preliminary amendment. 
A SECOND or SUBSEQUENT preliminary amendment. 
A substitute specification. 

A change of power of attorney and/or address letter. 

A computer-readable form of the sequence listing in accordance with PCT Rule 13ter.2 and 35 U.S.C. 1.821 - 1.825. 
A second copy of the published international application under 35 U.S.C. 154(d)(4). 
A second copy of the English language translation of the international application under 35 U.S.C. 154(d)(4). 
Other items or information: 

See Attachment A. 
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U.S. APPLICATION NO-<ifJ 



INTERNATIONAL APPLICATION NO. 

PCT/FR00/02433 



ATTORNEY'S DOCKET NUMBER 

R-341894 



21. The following fees are submitted: 
BASIC NATIONAL FEE (37 CFR 1.492 (a) (1) - (5)): 
Neither international preliminary examination fee (37 CFR 1.482) 
nor international search fee (37 CFR 1.445(a)(2)) paid to USPTO 
and International Search Report not prepared by the EPO or JPO . . . 



International preliminary examination fee (37 CFR 1.482) not paid to 

USPTO but International Search Report prepared by the EPO or JPO $860.00 

International preliminary examination fee (37 CFR 1.482) not paid to USPTO 

but international search fee (37 CFR 1.445(a)(2)) paid to USPTO $710.00 

International preliminary examination fee (37 CFR 1.482) paid to USPTO 

but all claims did not satisfy provisions of PCT Article 33(l)-(4) $690.00 

International preliminary examination fee (37 CFR 1.482) paid to USPTO 

and all claims satisfied provisions of PCT Article 33(l)-(4) $100.00 

ENTER APPROPRIATE BASIC FEE AMOUNT = 



CALCULATIONS PTO USE ONLY 



860.00 



Surcharge of $130.00 for furnishing the oath or declaration later than P) 20 KT30 
months from the earliest claimed priority date (37 CFR 1 .492(e)). 



130.00 



CLAIMS 


NUMBER FILED 


NUMBER EXTRA 


RATE 


$ 


Total claims 


40 -20 = 


20 


x $18.00 


$ 360.00 




Independent claims 


28 -3 = 


25 


x $80.00 


$ 2,000.00 




MULTIPLE DEPENDENT CLAIM(S) (if applicable) 


+ $270.00 


$ . 




TOTAL OF ABOVE CALCULATIONS = 


$ 3.350.00 




r—j Applicant claims small entity status. See 37 CFR 1.27. The fees indicated above 
are reduced by 1/2. + 


$ 




SUBTOTAL = 


$ 3,350.00 




Processing fee of $130.00 for furnishing the English translation later than Q 20 0 30 
months from the earliest claimed priority date (37 CFR 1.492(f)). 


$ 




TOTAL NATIONAL FEE = 


$3,350.00 




Fee for recording the enclosed assignment (37 CFR 1.21(h)). The assignment must be 
accompanied by an appropriate cover sheet (37 CFR 3.28, 3.31). $40.00 per property + 


$ 




TOTAL FEES ENCLOSED = 


$3,350.00 






Amount to be 
refunded: 


$ 


charged: 


$ 



A check in the amount of $ 3.350.00 



_ to cover the above fees is enclosed. 
in the amount of $ 



, to cover the above fees. 



b. i — I Please charge my Deposit Account No. 

A duplicate copy of this sheet is enclosed. 

c. J§[ The Commissioner is hereby authorized to charge any additional fees which may be required, or credit any 

overpayment to Deposit Account No. 13~21O0 . A duplicate copy of this sheet is enclosed. 

d. Q Fees are to be charged to a credit card. WARNING: Information on this form may become public. Credit card 

information should not be included on this form. Provide credit card information and authorization on PTO-2038. 



NOTE: Where an appropriate time limit under 37 CFR 1.494 or 1.495 has not been met, a petition to revive (37 CFR 
1.137 (a) or (b)) must be filed and granted to restore the application to pending status. 

SEND ALL CORRESPONDENCE TO: 

MATURE 



Joseph Krieger 

Mason, Kolerirnainen, Rathburn & Wyss *rt«ger 

853 Sanders Road, #330 25>595 

Northbrook, Illinois 60062 registration number 
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FORMPTO-1390 U.S. DEPARTMENT OF COMMERCE PATENT AND TRADEMARK OFFICE 

(REV. 11-2000) 

TRANSMITTAL LETTER TO THE UNITED STATES 
DESIGNATED/ELECTED OFFICE (DO/EO/US) 
CONCERNING A FILING UNDER 35 U.S.C. 371 


ATTORNEY 'S DOCKET NUMBER 

R-341894 


U.S. APPLICATION NO. (If known, see 37 CFR 1.5 

09/830902 


INTERNATIONAL APPLICATION NO. INTERNATIONAL FILING DATE 
PCT/FR00/02433 4 September 2000 


PRIORITY DATE CLAIMED 
3 September 1999 


TITLE OF INVENTION CLONKS, EXPRESSION AND CHARACTERIZATION OF THE SPG4 GENE RESPONSIBLE FOR THE 
MOST 68SMQN FORM OF AUTOSOMAL DOMINANT SPASTIC PARAPLEGIA 


APPLICANT(S)FORDO/EO/US Jean Weisserfjachj Jamile Hazan 



Item 20 - Other Items or Information: 



(a) Certificate Of Mailing By Express Mail For National Phase 
Application Of International Application No. PCT/FR00/02433 

(b) Seven (7) Drawing sheets 

(c) Identification of Inventors 

(d) Application Data Sheet 

(e) Copies of 

(i) PCT Request (Form PCT/RO/101) 

(ii) PCT Notification of Receipt of Record Copy (Form PCT/IB/301) 

(iii) PCT Notification Concerning Submission or Transmittal of 
Priority Document (Form PCT/IB/304) 

(iv) PCT Notice Informing Applicant of the Communication of the 
International Application to the Designated Offices (Form 
PCT/B/308) 



09/830902 
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CERTIFICATE OF MAILING BY EXPRESS MAIL 
FOR NATIONAL PHASE APPLICATION OF 
INTERNATIONAL APPLICATION NO. PCT/FR00/02433 

"Express Mail" Mailing Label No. EL713287403US. 

I hereby certify that the Transmittal Letter to the United States 
Designated/Elected Office (DO/EO/US) Concerning a Filing Under 35 
U.S.C. 371 and the documents referred to therein and the fee referred to 
therein are being deposited with the United States Postal Service 
"EXPRESS MAIL POST OFFICE TO ADDRESSEE" service under 
37 C.F.R. 1.10 on the date indicated below and is addressed to the 
Commissioner for Patents, Washington, D.C. 20231 




05/02/2001 

Date of Deposit 



Joseph Krieger 



Typed/printed name of person signing 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



09/830902 
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Application No. : 



U.S. National Serial No. : 



Filed : 



PCT International Application No. : 



PCT/FROO/02433 



VERIFICATION OF A TRANSLATION 



I, the below named translator, hereby declare that: 
My name and post office address are as stated below; 

That I am knowledgeable in the French language in which the below identified international 
application was filed, and that, to the best of my knowledge and belief, the English translation 
of the international application No. PCT/FROO/02433 is a true and complete translation of the 
above identified international application as filed. 

I hereby declare that all the statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made 
are punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United 
States Code and that such willful false statements may jeopardize the validity of the patent 
application issued thereon. 



Date: 20 April 2001 




Full name of the translator : 



Elaine Patricia PARRISH 



For and on behalf of RWS Group pic 



Post Office Address : 



Europa House, Marsham Way, 
Gerrards Cross, Buckinghamshire, 
England. 
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UNITED STATES PATENT AND TRADEMARK OFFICE 
Applicants: Jean Weissenbach, Jamile Hazan 

Application: CLONING, EXPRESSION AND CHARACTERIZATION OF THE 

SPG4 GENE RESPONSIBLE FOR THE MOST COMMON FORM OF 
AUTOSOMAL DOMINANT SPASTIC PARAPLEGIA 

Serial No.: Herewith Art Unit: 

Filing Date: Herewith Examiner: 

Case: R-341894 

PCT Application Information: 

PCT Serial No: PCT/FR00/02433 Priority Filing Date: September 3, 1999 

PCT Filing Date: September 4, 2000 

CERTIFICATE OF MAILING BY EXPRESS MAIL: "Express Mail" Mailing Label No. EL713287403US 

I hereby certify that this paper and/or fee is being deposited with the United States Postal Service "EXPRESS MAIL POST 
OFFICE TO ADDRESSEE" service under 37 C.F.R. 1.10 on the date indicated below and is addressed to the Commissioner 
for Patents, Washington, D.C. 20231 




Joseph Krieger 



Typed/printed name of person signing 



853 Sanders Road, #330 
Northbrook, Illinois 60062 
May 2, 2001 



Box "PCT" 

Commissioner for Patents 
Washington, D.C. 20231 



FIRST PRELIMINARY AMENDMENT 



Sir: 



Prior to the calculation of the application filing fees in connection with the 
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above-identified application, please amend the above-identified application as follows 
(a marked up version of the amended claims are on the pages following the remarks): 
IN THE SPECIFICATION: 

Page 1 , following the title, insert as a centered title -- BACKGROUND 

OF THE INVENTION -. 
Page 1, prior to the first full paragraph beginning at line 5 and following 

the inserted centered title "BACKGROUND OF THE 

INVENTION", insert as a subtitle ~1. Field of the Invention -. 
Page 1, prior to the second full paragraph beginning at line 12, insert as 

a subtitle -2. Background of the Invention -. 
Page 2, prior to the paragraph beginning at line 32, insert as a centered 

title - SUMMARY OF THE INVENTION -. 
Page 21, line 30, replace paragraph "LEGENDS OF THE FIGURES" 

with a centered title - BRIEF DESCRIPTION OF THE 

DRAWINGS -. 

Page 24, prior to line 1, insert as a centered title - DETAILED 

DESCRIPTION OF THE PREFERRED EMBODIMENTS- 
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Page 43, following the last paragraph, insert: 

What is claimed and desired to be secured by Letters Patent of 
the United States is:--. 

IN THE CLAIMS 

Rewrite claims 3-5, 8-10, 12-19, 21, 23-26, and 29-36 as follows: 

3. (amended) Purified or isolated nucleic acid according to claim 1, 
characterized in that it comprises at least one sequence of at least 15 consecutive 
nucleotides of the nt 714-809, ends inclusive, fragment of the sequence SEQ ID No. 2, 
of the sequence complementary thereto or of the sequence of the corresponding RNA 
thereof. 

4. (amended) Purified or isolated nucleic acid according to claim 1, 
characterized in that it comprises a mutation corresponding to a natural polymorphism 
in humans. 

5. (amended) Probe or primer, characterized in that it comprises a 
sequence of a nucleic acid according to claim 1 . 

8. (amended) Method for screening cDNA or genomic DNA 
libraries, or for cloning isolated genomic or cDNA encoding spastin, characterized in 
that it uses a nucleic acid sequence according to claim 1 . 
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9. (amended) Method according to claim 8, for identifying the 
genomic or cDNA sequence of the SPG4 gene of mammals. 

10. (amended) Method for identifying a mutation carried by the human 
SPG4 gene, characterized in that it uses a nucleic acid sequence according to claim 1. 

12. (amended) Method for identifying the nucleic acid sequences 
which promote and/or regulate the expression of the SPG4 gene, characterized in that it 
uses a nucleic acid sequence according to claim 1 . 

13. (amended) Nucleic acid identified using a method according to 

claim 9. 

14. (amended) Polypeptide encoded by a nucleic acid according to 

claim 1. 

15. (amended) Polypeptide according to claim 14, with the exception 
of the 584 amino acid peptide, the sequence of which is identified in the GenBank 
databank under the accession number AB029006. 

16. (amended) Polypeptide according to claim 14, characterized in that 
it comprises an amino acid sequence chosen from the group comprising: 
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9. (amended) Method according to claim 8, for identifying the 
genomic or cDNA sequence of the SPG4 gene of mammals. 

10. (amended) Method for identifying a mutation carried by the human 
SPG4 gene, characterized in that it uses a nucleic acid sequence according to claim 1 . 

12. (amended) Method for identifying the nucleic acid sequences 
which promote and/or regulate the expression of the SPG4 gene, characterized in that it 
uses a nucleic acid sequence according to claim 1 . 

13. (amended) Nucleic acid identified using a method according to 

claim 9. 

14. (amended) Polypeptide encoded by a nucleic acid according to 

claim 1. 

15. (amended) Polypeptide according to claim 14, with the exception 
of the 584 amino acid peptide, the sequence of which is identified in the GenBank 
databank under the accession number AB029006. 

16. (amended) Polypeptide according to claim 14, characterized in that 
it comprises an amino acid sequence chosen from the group comprising: 
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a) the sequence SEQ ID No. 3, the sequence SEQ ID No. 73, 
the sequence SEQ ID No. 107 or the sequence of at least 10 consecutive amino acids of 
one of these sequences; and 

b) the sequences which are homologs or variants of the 
sequences SEQ ID No. 3, SEQ ID No. 73 or SEQ ID No. 107. 

17. (amended) Polypeptide according to claim 14, characterized in that 
it comprises the sequence of at least 8 consecutive amino acids of the sequence of the aa 
197-228, ends inclusive, fragment of the sequence SEQ ID No. 3. 

18. (amended) Polypeptide according to claim 14, characterized in that 
it comprises an amino acid sequence chosen from the group comprising the sequence 
SEQ ID No. 3, the sequence SEQ ID No. 73, the sequence SEQ ID No. 107, which 
sequences carrying at least one of the mutations corresponding to a natural 
polymorphism in humans, and the sequences of the fragments thereof of at least 10 
consecutive amino acids. 

19. (amended) Cloning and/or expression vector containing a nucleic 
acid sequence according to claim 1 . 
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21 . (amended) Host cell transformed with a vector according to claim 

19. 

23. (amended) Mammal, except a human, according to claim 22, 
comprising a transformed cell, characterized in that the sequence of at least one of the 
two alleles of the SPG4 gene contains at least one of the mutations corresponding to a 
natural polymorphism in humans. 

24. (amended) Use of a nucleic acid sequence according to claim 5, as 
a probe or primer, for detecting and/or amplifying nucleic acid sequences. 

25. (amended) Use of a nucleic acid sequence according to claim 1, 
for screening a genomic or cDNA library. 

26. (amended) Use of a nucleic acid sequence according to claim 1, 
for producing a recombinant or synthetic polypeptide. 

29. (amended) Monoclonal or polyclonal antibodies or their 
fragments, chimeric antibodies or immunoconjugates, characterized in that they are 
capable of specifically recognizing a polypeptide according to claim 14. 
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30. (amended) Method for detecting and/or purifying a polypeptide, 
characterized in that it uses an antibody according to claim 29. 

31. (amended) Method for genotypic diagnosis of AD-HSP associated 
with the SPG4 gene, characterized in that a nucleic acid sequence according to claim 1 
is used. 

32. (amended) Method for genotypic diagnosis of AD-HSP associated 
with the presence of at least one mutation on a sequence of the SPG4 gene, using a 
biological sample from a patient, characterized in that it includes the following steps: 

a) where appropriate, isolation of the genomic DNA from the 
biological sample to be analyzed, or production of cDNA from the RNA of the 
biological sample; 

b) specific amplification of said DNA sequence of the SPG4 
gene likely to contain a mutation, using primers according to claim 5; 

c) analysis of the amplification products obtained and 
comparison of their sequence with the corresponding normal sequence of the SPG4 
gene. 
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33. (amended) Method for diagnosing AD-HSP associated with 
abnormal expression of a polypeptide encoded by the SPG4 gene, characterized in that 
one or more antibodies according to claim 29 is brought into contact with the biological 
material to be tested, under conditions which allow the possible formation of specific 
immunological complexes between said polypeptide and said antibody, and in that the 
immunological complexes possibly formed are detected and/or quantified. 

34. (amended) Method for selecting a chemical or biochemical 
compound which is capable of modulating the expression or the activity of a 
polypeptide encoded by the 5PG4 gene, characterized in that it comprises bringing a 
nucleic acid sequence according to claim 1 into contact with a candidate compound, and 
detecting a modification of the activity of said polypeptide. 

35. (amended) Use of a nucleic acid sequence according to claim 1, 
for studying the expression or the activity of the SPG4 gene. 

36. (amended) Kit for diagnosis, characterized in that it comprises at 
least a nucleic acid according to claim 5. 
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Add the following new claims 37-40: 

37. (new claim) Method for selecting a chemical or biochemical 
compound which is capable of modulating the expression or the activity of a 
polypeptide encoded by the 5PG4 gene, characterized in that it comprises bringing a 
nucleic acid sequence according to claim 14 into contact with a candidate compound, 
and detecting a modification of the activity of said polypeptide. 

38. (new claim) Use of a polypeptide according to claim 14 for 
studying the expression or the activity of the SPG4 gene. 

39. (new claim) Kit for diagnosis, characterized in that it comprises at 
least an antibody according to claim 29. 

40. (new claim) Use of an antibody according to claim 29 for studying 
the expression or the activity of the SPG4 gene. 

IN THE SEQUENCE LISTING 

Line < 110> , replace "CENTRE NATIONAL DE LA RECHERCHE 
SCIENTIFIQUE - CNRS" with the following: 

— Weissenbach, Jean 
Hazan, Jamile--. 

Line < 130> , replace "D18374" with -R-341894--. 
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REMARKS 

This Amendment is being presented in connection with the filing of the 
above-identified application which is a national phase application of the above- 
identified international (PCT) application. A substitute specification is being submitted 
concurrently with this Amendment and the above-identified application. This substitute 
specification is in accordance with the translation of the originally filed international 
application including the specification and claims; and an Abstract of the Disclosure 
(added as page 48 based on the abstract that was included in the published International 
Application (Publication No. WO 01/18198 Al)). In addition, seven (7) drawing 
sheets are being submitted concurrently with this Amendment and the above-identified 
application for use in connection with the above-identified application. These sheets 
are in accordance with the drawings appearing in International Publication 
No. WO 01/18198 Al. 

With respect to the above amendments to the specification, subtitles have 
been added in order to conform the application to the requirements for applications of 
the United States Patent and Trademark Office. With respect to the above amendments 
to the claims, the claims have been amended principally so that each of the claims is 
dependent on a single claim rather than on multiple claims. 
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In accordance with 37 C.F.R. §§1.821-1.825, a sequence listing of 98 

pages is being currently submitted herewith. This sequence listing is the listing 

submitted with the above-identified international application, but does incorporate the 

above amendments to line < 110 > (where the inventors were substituted for the 

assignee) and line < 130 > (where the attorney docket number for the above-identified 

national phase application was substituted for the attorney docket number of the 

international application). In further conformity with 37 C.F.R. §1.824, the sequence 

listing, as amended above, is being provided in computer readable form an a diskette in 

conformity with 37 C.F.R. §1. 824(c)(1). 

Respectfully submitted, 

Mason, Kolehmainen, Rathburn & Wyss 

(Customer #008668) 



By VA^/ / /^ r ^ 

JoSefh Krj^ger (Reg'. No. 2X595) 
853 Sanders, #330 
Northbrook, Illinois 60062 
Telephone: 847-509-3720 
Facsimile: 847-509-3722 
e-mail: j_krieger@compuserve.com 
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Version of Amended Claim with Markings to Show Changes Made 

The following is a marked up version of claims showing the amendments 
made to that claim (the changes are shown by underlining added matter and striking 
through deleted matter): 
IN THE CLAIMS 

Rewrite claims 3-5, 8-10, 12-19, 21, 23-26, and 29-36 as follows: 

3. (amended) Purified or isolated nucleic acid according to claim 1-er 
2-, characterized in that it comprises at least one sequence of at least 15 consecutive 
nucleotides of the nt 714-809, ends inclusive, fragment of the sequence SEQ ID No. 2, 
of the sequence complementary thereto or of the sequence of the corresponding RNA 
thereof. 

4. (amended) Purified or isolated nucleic acid according to one of 
claims 1 to 3 claim 1 , characterized in that it comprises a mutation corresponding to a 
natural polymorphism in humans. 

5. (amended) Probe or primer, characterized in that it comprises a 
sequence of a nucleic acid according to one of claims claim l-te-4. 
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8. (amended) Method for screening cDNA or genomic DNA 
libraries, or for cloning isolated genomic or cDNA encoding spastin, characterized in 
that it uses a nucleic acid sequence according to one of claims claim 

9. (amended) Method according to claim 8, for identifying the 
genomic or cDNA sequence of the SPG4 gene of mammals , in particular of mice . 

10. (amended) Method for identifying a mutation carried by the human 
SPG4 gene, characterized in that it uses a nucleic acid sequence according to one of 
claims claim l-te-?. 

12. (amended) Method for identifying the nucleic acid sequences 
which promote and/or regulate the expression of the SPG4 gene, characterized in that it 
uses a nucleic acid sequence according to one of claims claim He^. 

13. (amended) Nucleic acid identified using a method according to one 
of claims claim 9 to 12 . 

14. (amended) Polypeptide encoded by a nucleic acid according to one 
of claims claim 1 to 4 and 13 . 
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15. (amended) Polypeptide according to claim 14, preferably with the 
exception of the 584 amino acid peptide, the sequence of which is identified in the 
GenBank databank under the accession number AB029006. 

16. (amended) Polypeptide according to claim 14 or 15 , characterized 
: in that it comprises an amino acid sequence chosen from the group comprising: 

a) the sequence SEQ ID No. 3, the sequence SEQ ID No. 73, 

- the sequence SEQ ID No. 107 or the sequence of at least 10 consecutive amino acids of 
one of these sequences; and 

b) the sequences which are homologs or variants of the 
sequences SEQ ID No. 3, SEQ ID No. 73 or SEQ ID No. 107. 

17. (amended) Polypeptide according to claim 14 or 15 , characterized 
in that it comprises the sequence of at least 8 consecutive amino acids of the sequence 
of the aa 197-228, ends inclusive, fragment of the sequence SEQ ID No. 3. 

18. (amended) Polypeptide according to claim 14 or 15 , characterized 
in that it comprises an amino acid sequence chosen from the group comprising the 
sequence SEQ ID No. 3, the sequence SEQ ID No. 73, the sequence SEQ ID No. 107, 
which sequences carrying at least one of the mutations corresponding to a natural 
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polymorphism in humans, and the sequences of the fragments thereof of at least 10 
consecutive amino acids. 

19. (amended) Cloning and/or expression vector containing a nucleic 
acid sequence according to one of claims claim 1 to 4, and 13 . 

21 . (amended) Host cell transformed with a vector according to claim 

23. (amended) Mammal, except a human, according to claim 22, 
comprising a transformed cell, characterized in that the sequence of at least one of the 
two alleles of the SPG4 gene contains at least one of the mutations corresponding to a 
natural polymorphism in humans or identified using a method according to claim 10 or 

24. (amended) Use of a nucleic acid sequence according to one of 
eteims claim 5, 6 and 13, as a probe or primer, for detecting and/or amplifying nucleic 
acid sequences. 

25. (amended) Use of a nucleic acid sequence according to one of 
eteims claim 1 to 7, and 13 , for screening a genomic or cDNA library. 
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26. (amended) Use of a nucleic acid sequence according to one of 
claims claim 1 to 4 and 13 , for producing a recombinant or synthetic polypeptide. 

29. (amended) Mene-Monoclonal or polyclonal antibodies or their 
fragments, chimeric antibodies or immunoconjugates, characterized in that they are 
capable of specifically recognizing a polypeptide according to one of claims claim 14-te 
18, and 28 . 

30. (amended) Method for detecting and/or purifying a polypeptide 
according to one of claims 14 to 18, and 28 , characterized in that it uses an antibody 
according to claim 29. 

31. (amended) Method for genotypic diagnosis of AD-HSP associated 
with the SPG4 gene, characterized in that a nucleic acid sequence according to one of 
claims claim 1 to 7 and 13 is used. 

32. (amended) Method for genotypic diagnosis of AD-HSP associated 
with the presence of at least one mutation on a sequence of the SPG4 gene, using a 
biological sample from a patient, characterized in that it includes the following steps: 

a) where appropriate, isolation of the genomic DNA from the 
biological sample to be analyzed, or production of cDNA from the RNA of the 
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biological sample; 

b) specific amplification of said DNA sequence of the SPG4 
gene likely to contain a mutation, using primers according to cither of claims claim 5 
and 6 or a nucleic acid according to claim 13 ; 

c) analysis of the amplification products obtained and 
comparison of their sequence with the corresponding normal sequence of the SPG4 
gene. 

33. (amended) Method for diagnosing AD-HSP associated with 
abnormal expression of a polypeptide encoded by the SPG4 gene, characterized in that 
one or more antibodies according to claim 29 is (are)-brought into contact with the 
biological material to be tested, under conditions which allow the possible formation of 
specific immunological complexes between said polypeptide and said antibody-er- 
antibodies , and in that the immunological complexes possibly formed are detected 
and/or quantified. 

34. (amended) Method for selecting a chemical or biochemical 
compound which is capable of interacting directly or indirectly with a polypeptide 
according to one of claims 14 to 18, and 28, or with a nucleic acid according to one of 
p.lnims 1 to 7 : and 13 ; and/or which makes it possible to modulatc modulating the 
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expression or the activity of these a polvpcptidcs polvpeptide encoded by the 5PG4 
gene , characterized in that it comprises bringing a nucleic acid sequence according to 
one of claims claim 1 to 7, and 13, a polypeptide according to one of claims 14 to 18, 
and 28, a vector according to cither of claims 19 and 20, a cell according to claim 21, a 
mammal according to cither of claims 22 and 23 or an antibody according to claim 29 
into contact with a candidate compound, and detecting a modification of the activity of 
said polypeptide. 

35. (amended) Use of a nucleic acid sequence according to ene-of 
claims claim 1 to 7, and 13, of a polypeptide according to one of claims 14 to 18, and 
28, of a vector according to cither of claims 19 and 20, of a cell according to claim 21, 
of a mammal according to cither of claims 22 and 23 or of an antibody according to 
claim 29 , for studying the expression or the activity of the SPG4 gene. 

36. (amended) Kit or pack for diagnosis, characterized in that it 
comprises at least one compound chosen from the following group of compounds: 

a) a nucleic acid according to cither of claims claim 5 and 6; 

and 

b) an antibody according to claim 29 . 
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CLONING, EXPRESSION AND CHARACTERIZATION OF THE SPG4 GENE 
RESPONSIBLE FOR THE MOST COMMON FORM OF AUTOSOMAL DOMINANT 
SPASTIC PARAPLEGIA. 

5 The invention relates to the identification and characterization of the SPG4 gene 

encoding spastin, which is responsible for the most common form of autosomal 
dominant hereditary spastic paraplegia (HSP), to the cloning and characterization of its 
cDNA, and also to the corresponding polypeptides. The invention also relates to 
vectors, to transformed cells and to transgenic animals, and also to diagnostic methods 

10 and kits and to methods for selecting a chemical or biochemical compound capable of 
interacting directly or indirectly with a polypeptide according to the invention. 

Hereditary spastic paraplegias (HSPs) are degenerative disorders of the central 
nervous system, characterized by bilateral and progessive spasticity of the lower limbs. 
They reveal themselves clinically through difficulties in walking possibly evolving into 

15 total paralysis of both legs. The physiopathology of this set of diseases is, to date, 
relatively undocumented; however, anatomopathological data make it possible to 
conclude that the attack is limited to the pyramidal tracts responsible for voluntary 
motricity in the spinal cord (Reid, 1997). Various clinical and genetic forms of HSP 
exist. The so-called "pure" HSPs, which correspond to isolated spasticity of the lower 

20 limbs, are clinically distinguished from the "complex" HSPs, for which the spasticity of 
the legs is associated with other clinical signs of neurological or non-neurological type 
(Bruyn et al., 1991). From a genetic point of view, the HSPs can be transmitted 
according to the autosomal dominant (AD-HSP), autosomal recessive (AR-HSP) or X- 
linked (X-HSP) mode. The "pure" form of HSP, which is most commonly transmitted 

25 according to the autosomal dominant mode, remains the most frequent (approximately 
80% of HSPs) (Reid, 1997). The incidence of HSPs, which remains difficult to estimate 
because of rare epidemiological studies and the considerable clinical variability, varies 
from 0.9 : 100 000 in Denmark, 3 to 9.6 : 100 000 in certain regions of Spain (Polo et 
al., 1991) or 14 : 100 000 in Norway (Skre, 1974) (approximately 3 : 100 000 in 

30 France). 

In addition to this great clinical variability, which is observed not only between 
various families but also between various affected members of the same family, the 
HSPs are also characterized by considerable genetic heterogeneity. In the case of 
AD-HSPs, four loci have been identified, to date, on chromosomes 14 (locus SPG3) 
35 (Hazan et al., 1993), 2 (locus SPG4) (Hazan et al., 1994; Hentali et al., 1994), 15 
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(locus SPG6) (Fink et al., 1995) and 8 (locus SPG8) (Hedera et al., 1999). The study of 
a large number of families exhibiting an AD-HSP has shown that the gene carried by 
chromosome 2 is a main locus of this form of the disease, found in 40 to 50% of the 
families analyzed (The Hereditary Spastic Paraplegia Working Group, 1996; Durr et al., 
5 1996). An anticipation phenomenon was observed in some locus SPG4-linked HSP 
families; this phenomenon has, subsequently, been associated with the expansion of a 
(CAG)n repeat demonstrated in 6 Danish families (Nielsen et al., 1997) using the RED 
(for Rapid Expansion Detection) technique. It has, however, never been possible to 
confirm this expansion in any of the families tested by this method or by the systematic 

10 search for sequences of (CAG)n type in physical maps composed of YAC (for Yeast 
Artificial Chromosome) or BAC (for Bacterial Artificial Chromosome) clones (Hazan et 
al., Genomics, 60 (3), 309-19, 1999). 

To date, three genes responsible for two forms of X-HSP and one form of AR- 
HSP have been identified. Mutations in the gene which encodes a neuron-specific cell 

15 adhesion molecule, L1-CAM (for L1 Cell Adhesion Molecule), and which is located at 
Xq28 (locus SPG1) cause a complex form of HSP (Jouet et al., 1994) in which the 
spasticity is associated with a mental handicap, whereas mutations in the PLP (for 
ProteoLipid Protein) gene located at Xq21 (locus SPG2), which encodes a constitutive 
molecule of the myelin layer, cause pure and complex forms of X-HSP (Saugier-Veber, 

20 P. et al., 1994). More recently, mutations in the gene located at 16q24.3 (locus SPG7), 
which encodes paraplegin, a mitochondrial ATPase of the AAA (for "ATPases 
Associated with diverse cellular Activities") protein family (Confalonieri et al., 1995), 
have been associated with complex and pure forms of AR-HSP (Casari et al., 1998). 

Thus, there remains, today, a great need to identify and characterize the gene 

25 responsible for the most common form of AD-HSP. The identification of this gene 
should, in particular, allow, besides the possibility of a test for antenatal screening in 
the families concerned, a better understanding of some of the molecular mechanisms 
engendering these degenerations specific for nerve bundles of the spinal cord, or even 
make it possible to provide an elementary response regarding therapeutic treatment for 

30 the patients. 

This is precisely the subject of the present invention. 

After having delimited the localization range between the D2S352 and D2S2347 
genetic markers by studying recombination events in locus SPG4-linked HSP families, 
the inventors have established a contig of BACs covering a physical distance evaluated 
35 at approximately 1.5 Mb and have undertaken a positional cloning strategy based on 
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sequencing the SPG4 range in order to completely identify all the genes located in the 
candidate region. The analysis of the sequence of the two BACs, D (b336P14) and 
G (B763N4), has revealed the presence of a gene which is composed of 17 exons, 
extending over a distance of approximately 100 kb, and which exhibits homology with 
5 the genes encoding proteins of the AAA family. Comparison of the sequence of this 
gene between the healthy and affected individuals of AD-HSP families has made it 
possible to demonstrate various mutations in the patients. 

A subject of the invention is thus the identification and characterization of the 
SPG4 (or SPAST) gene encoding a novel nuclear member of the AAA family, 

1 0 responsible for the most common form of AD-HSP. 

In a first aspect, a subject of the present invention is a purified or isolated 
nucleic acid of the SPG4 gene, characterized in that it comprises at least 15 
consecutive nucleotides, preferably 20, 25, 30, 35, 40, 45, 50, 75, 100 or 200 
consecutive nucleotides, of a sequence chosen from the group comprising: 

15 - the sequence SEQ ID No. 1, which is a genomic sequence of the human SPG4 gene; 

- the nucleic acid sequences which are homologs or variants of the nucleic acid of 
sequence SEQ ID No. 1; 

- the sequence which is complementary thereto; and 

- the sequence of the corresponding RNA thereof. 

20 The present invention relates, of course, to both the DNA and RNA sequences, 

and also the sequences which hybridize with them, as well as the corresponding 
double-stranded DNAs. 

The terms "nucleic acid", "nucleic acid sequence" or "sequence of nucleic acid", 
"polynucleotide", "oligonucleotide", "polynucleotide sequence", and "nucleotide 

25 sequence", which will be used equally in the present description, will be intended to 
refer to both a double-stranded DNA, a single-stranded DNA and products of 
transcription of said DNAs, and/or an RNA fragment, said isolated natural, or synthetic 
fragments which may or may not include unnatural nucleotides, referring to a precise 
series of nucleotides, which may or may not be modified, making it possible to define a 

30 fragment or a region of a nucleic acid. The expression "natural isolated, or synthetic 
DNA and/or RNA fragment, which may or may not include unnatural nucleotides" is 
intended to mean a precise series of nucleotides, which may or may not be modified, 
making it possible to define a fragment, a segment or a region of a nucleic acid. 

It should be understood that the present invention does not relate to the 

35 genomic nucleotide sequences in their natural chromosomal environment, i.e. in the 
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natural state. It involves sequences which have been isolated and/or purified, i.e. they 
have been removed directly or indirectly, for example by copying, their environment 
having been at least partially modified. 

The term "homologous nucleic acid sequence" is intended to refer to the 
5 sequences which have, with respect to the reference nucleic acid sequence, certain 
modifications, such as in particular a deletion, a truncation, an extension, a chimeric 
fusion and/or a mutation, in particular a point mutation, and the nucleic acid sequence 
of which shows at least 80%, preferably 90% or 95%, identity after alignment, with the 
reference nucleic acid sequence. 

10 For the purpose of the present invention, the term "percentage of identity" 

between two nucleic acid or amino acid sequences is intended to refer to a percentage 
of nucleotides or of amino acid residues which are identical between the two 
sequences to be compared, obtained after the best alignment, this percentage being 
purely statistical and the differences between the two sequences being distributed 

15 randomly and throughout their length. Sequence comparisons between two nucleic 
acid or amino acid sequences are traditionally carried out by comparing these 
sequences after having optimally aligned them, said comparison being carried out by 
segment or by "window of comparison" in order to identify and compare local regions of 
sequence similarity. The optimal alignment of the sequences for comparison can be 

20 produced, besides manually, by means of the local homology algorithm of Smith and 
Waterman (1981) [Ad. App. Math. 2:482], by means of the local homology algorithm of 
Neddleman and Wunsch (1970) [J. Mol. Biol. 48:443], by means of the similarity search 
method of Pearson and Lipman (1988) [Proc. Natl. Acad. Sci. USA 85:2444], and by 
means of computer programs using these algorithms (GAP, BESTFIT, FASTA and 

25 T FASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 
Science Dr., Madison, Wl, or with the BLAST N or BLAST P comparison programs). 

The percentage of identity between two nucleic acid or amino acid sequences is 
determined by comparing these two optimally aligned sequences by window of 
comparison in which the region of the nucleic acid or amino acid sequence to be 

30 compared can comprise additions or deletions with respect to the reference sequence 
for optimal alignment between these two sequences. The percentage of identity is 
calculated by determining the number of identical positions for which the nucleotide or 
the amino acid residue is identical between the two sequences, dividing this number of 
identical positions by the total number of positions in the window of comparison and 
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multiplying the result obtained by 100 so as to obtain the percentage of identity 
between these two sequences. 

For example, the BLAST program "BLAST 2 sequences" (Tatusova et al., "Blast 
2 sequences - a new tool for comparing protein and nucleotide sequences", FEMS 
5 Microbiol. Lett. 174:247-250), available on the site 
http://www.ncbi.nlm.nih.gov/qon7bl2.html . may be used, the parameters used being 
those given by default (in particular for the parameters "open gap penalty" : 5, and 
"extension gap penalty" : 2; the matrix chosen being, for example, the "BLOSUM 62" 
matrix proposed by the program), the percentage of identity between the two 

10 sequences to be compared being calculated directly by the program. 

It preferably involves sequences for which the complementary sequences are 
capable of hybridizing specifically with one of the sequences of the invention. 
Preferably, the specific or high stringency hybridization conditions will be such that they 
ensure at least 80%, preferably 90% or 95%, identity after alignment between one of 

15 the two sequences and the sequence which is complementary to the other. 

Hybridization under high stringency conditions means that the temperature and 
ionic strength conditions are chosen such that they allow the hybridization between two 
complementary DNA fragments to be maintained. By way of illustration, high stringency 
conditions of the hybridization step for the purposes of defining the polynucleotide 

20 fragments described above are advantageously as follows. 

The DNA-DNA or DNA-RNA hybridization is carried out in two steps: 
(1) prehybridization at 42°C for 3 hours in phosphate buffer (20 mM, pH 7.5) containing 
5 x SSC (1 x SSC corresponds to a 0.15 M NaCI + 0.015 M sodium citrate solution), 
50% of formamide, 7% of sodium dodecyl sulfate (SDS), 10 x Denhardt's, 5% of 

25 dextran sulfate and 1% of salmon sperm DNA; (2) actual hybridization for 20 hours at a 
temperature dependent on the size of the probe (i.e. 42°C for a probe of size > 100 
nucleotides), followed by two 20-minute washes at 20°C in 2 x SSC + 2% SDS and one 
20-minute wash at 20°C in 0.1 x SSC + 0.1% SDS. The final wash is carried out in 
0.1 x SSC + 0.1% SDS for 30 minutes at 60°C for a probe of size > 100 nucleotides. 

30 The high stringency hybridization conditions described above for a polynucleotide of 
defined size will be adjusted by those skilled in the art for oligonucleotides of greater or 
smaller size, according to the teaching of Sambrook et al., 1989. 

The term "nucleic acid sequence which is a variant" or "nucleic acid which is a 
variant" of a reference nucleic acid sequence will be intended to refer to the set of 

35 nucleic acid sequences corresponding to allelic variants, i.e. individual variations of the 
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reference nucleic acid sequence. These natural mutated sequences correspond to 
polymorphisms present in mammals, in particular in human beings, and in particular to 
polymorphisms which can cause a pathology to occur and/or to develop. 

While the sequences according to the invention relate to normal sequences, 
5 they also relate to sequences which are mutated insofar as they include at least one 
point mutation, and preferably at most 10% of mutations, with respect to the normal 
sequence. 

In particular, the variant nucleic acid sequences will comprise any sequence of 
at least 15 consecutive nucleotides, preferably 20, 25, 30, 50, 100 or 200 consecutive 
10 nucleotides, of a polymorphic sequence of the genomic sequence of the human SPG4 
gene of sequence SEQ ID No. 1, and the nucleic acid sequence of which has, with 
respect to the sequence SEQ ID No. 1, at least one mutation corresponding in 
particular to a truncation, deletion, substitution and/or addition of an amino acid 
residue. In the present case, the variant nucleic acid sequences having at least one 
1 5 mutation will herein be linked to the pathologies of AD-HSP type linked to SPG4 locus. 

Preferably, the present invention relates to the mutated nucleic acid sequences 
in which the mutations produce a modification of the amino acid sequence of the 
polypeptide encoded by the normal sequence. 

The term "variant nucleic acid sequences" will also be intended to refer to any 
20 RNA or cDNA resulting from a mutation of a splice site of the genomic nucleic acid 
sequence SEQ ID No. 1. 

Preferably, the invention relates to a purified or isolated nucleic acid of the 
SPG4 gene according to the invention, characterized in that it comprises a sequence 
chosen from the group comprising: 
25 a) the sequence SEQ ID No. 1, the sequence SEQ ID No. 2, the sequence SEQ ID 
No. 72, the sequence SEQ ID No. 106 or the sequence of at least 15, preferably 20, 
25, 30, 35, 40, 45, 50, 75, 100 or 200, consecutive nucleotides of the sequence 
SEQ ID No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; 

b) the nucleic acid sequences which are homologs or variants of the sequences SEQ 
30 ID No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; and 

c) the complementary sequence or the RNA sequence corresponding to the 
sequences as defined in a) and b), 

preferably with the exception of the nucleic acid identified in the GenBank database 
under the accession number AB029006. 
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The nucleic acid the sequence of which is disclosed in the GenBank database 
under the accession number AB029006 corresponds to the sequence of one of the 
100 cDNAs derived from a human brain mRNA library identified by the Kazusa DNA 
Research Institute in Japan (Kikuno et a!., DNA Resarch, 6, 197-205, 1999). 
5 Preferably, the invention relates to a purified or isolated nucleic acid according 

to the invention, characterized in that it comprises at least one sequence of at least 15 
consecutive nucleotides, preferably 20, 25, 30, 50 or 75 consecutive nucleotides, of the 
nt 714-809, ends inclusive, fragment of the sequence SEQ ID No. 2, of the sequence 
complementary thereto or of the sequence of the corresponding RNA thereof. 
10 The invention preferably relates to a purified or isolated nucleic acid according 

to the present invention, characterized in that it comprises a sequence chosen from the 
following group: 

- the sequence SEQ ID No. 1 ; 

- the sequence SEQ ID No. 2, which is the cDNA sequence encoding human spastin; 
15 - the sequences SEQ ID No. 72 and SEQ ID No. 106, the sequence SEQ ID No. 72 

representing the sequence of the incomplete cDNA encoding murine spastin 
represented in Figure 5, "mouse" line, and the SEQ ID No. 106 representing the 
complete sequence thereof; 

- the nucleic acid sequences which are homologs or variants of the sequences SEQ ID 
20 No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; 

- the sequence complementary thereto; and 

- the sequence of the corresponding RNA thereof. 

Preferably, the invention relates to a purified or isolated nucleic acid according 
to the invention, characterized in that it comprises at least one mutation which 
25 corresponds to a natural polymorphism in humans, in particular the position and nature 
of which are identified in Table 5. 

The primers or probes, characterized in that they comprise a sequence of a 
nucleic acid according to the invention, also form part of the invention. 

The present invention thus relates to the set of primers which can be deduced 
30 from the nucleotide sequences of the invention and which may make it possible to 
demonstrate said nucleotide sequences of the invention, in particular the mutated 
sequences, using in particular an amplification method such as the PCR method, or a 
related method. 

The present invention also relates to the set of probes which can be deduced 
35 from the nucleotide sequences of the invention, in particular from the sequences 



8 



capable of hybridizing with them, and which may make it possible to demonstrate said 
nucleotide sequences, in particular to distinguish the norma! sequences from the 
mutated sequences. 

The present invention relates, in particular, to the probes or primers having 
5 sequences chosen from the sequences SEQ ID No. 4 to SEQ ID No. 71. 

The invention also relates to the use of a nucleic acid sequence according to 
the invention as a probe or primer, for detecting, identifying, assaying or amplifying a 
nucleic acid sequence. 

According to the invention, the polynucleotides which can be used as a probe or 
10 as a primer in processes for detecting, identifying, assaying or amplifying a nucleic acid 
sequence will have a minimum size of 15 bases, preferably of 20 bases, or better still 
of 25 to 30 bases. 

The set of probes and primers according to the invention may be labeled 
directly or indirectly with a radioactive or nonradioactive compound, using methods well 
15 known to those skilled in the art, in order to obtain a detectable and/or quantifiable 
signal. 

The nonlabeled polynucleotide sequences according to the invention can be 
used directly as a probe or primer. 

The sequences are generally labeled so as to obtain sequences which can be 
20 used for many applications. The labeling of the primers or of the probes according to 
the invention is carried out with radioactive elements or with nonradioactive molecules. 

Among the radioactive isotopes used, mention may be made of 32 P, 33 P, 35 S, 3 H 
or 125 l. The nonradioactive entities are selected from ligands, such as biotin, avidin or 
streptavidin, dioxygenin, haptens, colorants and luminescent agents, such as 
25 radioluminescent, chemiluminescent, bioluminescent, fluorescent or phosphorescent 
agents. 

The polynucleotides according to the invention can thus be used as a primer 
and/or probe in processes using, in particular, the PCR (polymerase chain reaction) 
technique (Erlich, 1989; Innis et al., 1990, and Rolfs et al., 1991). This technique 

30 requires choosing pairs of oligonucleotide primers framing the fragment which must be 
amplified. Reference may, for example, be made to the technique described in 
American patent US No. 4,683,202. The amplified fragments can be identified, for 
example after agarose or polyacrylamide gel electrophoresis, or after a 
chromatographic technique such as gel filtration or ion exchange chromatography, and 

35 then sequenced. The specificity of amplification can be controlled using, as a primer, 
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the nucleotide sequences of polynucleotides of the invention and, as a matrix, plasmids 
containing these sequences or the derived amplification products. The amplified 
nucleotide fragments can be used as reagents in hybridization reactions in order to 
demonstrate the presence, in a biological sample, of a target nucleic acid having a 
5 sequence complementary to that of said amplified nucleotide fragments. 

The invention is also directed toward the nucleic acids which can be obtained 
by amplification using primers according to the invention. 

Other techniques for amplifying the target nucleic acid can be advantageously 
employed as an alternative to PCR (PCR-like), using pairs of primers having nucleotide 

10 sequences according to the invention. The term "PCR-like" will be intended to refer to 
all methods using direct or indirect reproductions of nucleic acid sequences, or in which 
the labeling systems have been amplified. These techniques are, of course, known. In 
general, they involve amplifying the DNA with a polymerase; when the sample of origin 
is an RNA, it is advisable to perform reverse transcription beforehand. There are, 

15 currently, a great many processes which enable this amplification, such as for example 
the SDA (Strand Displacement Amplification) technique (Walker et a!., 1992), the TAS 
(Transcription-based Amplification System) technique described by Kwoh et al. in 
1989, the 3SR (Self-Sustained Sequence Replication) technique described by Guatelli 
et al. in 1990, the NASBA (Nucleic Acid Sequence Based Amplification) technique 

20 described by Kievitis et al. in 1991, the TMA (Transcription Mediated Amplification) 
technique, the LCR (Ligase Chain Reaction) technique described by Landegren et al. 
in 1988 and improved by Barany et al. in 1991, which uses a heat-stable ligase, the 
RCR (Repair Chain Reaction) technique described by Segev in 1992, the CPR (Cycling 
Probe Reaction) technique described by Duck et al. in 1990, and the Q-beta-replicase 

25 amplification technique described by Miele et al. in 1983 and improved, in particular, by 
Chu et al. in 1986 and Lizardi et al. in 1988, and then by Burg et al., and also by Stone 
et al., in 1996. 

When the target polynucleotide to be detected is an mRNA, use will 
advantageously be made, prior to carrying out an amplification reaction using the 
30 primers according to the invention or carrying out a detection process using the probes 
of the invention, of an enzyme of reverse transcriptase type in order to obtain a cDNA 
from the mRNA contained in the biological sample. The cDNA obtained will then serve 
as a target for the primers or probes used in the amplification or detection process 
according to the invention. 
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The probe hybridization technique can be carried out in diverse ways (Matthews 
et al., 1988). The most general method consists in immobilizing the nucleic acid 
extracted from the cells of various tissues or from cells in culture, on a support (such as 
nitrocellulose, nylon or polystyrene), and in incubating the immobilized target nucleic 
5 acid with the probe, under well defined conditions. After hybridization, the excess probe 
is eliminated and the hybrid molecules formed are detected using the appropriate 
method (measurement of the radioactivity, of the fluorescence or of the enzymatic 
activity linked to the probe). 

According to another embodiment of the nucleic acid probes according to the 
10 invention, the latter can be used as a capture probe. In this case, a probe, termed 
"capture probe", is immobilized on a support and is used to capture, by specific 
hybridization, the target nucleic acid obtained from the biological sample to be tested, 
and the target nucleic acid is then detected using a second probe, termed "detection 
probe", labeled with an easily detectable element. 
"15 The splice acceptor or donor site sequences according to the present invention 

identified in Table 3 (sequences SEQ ID No. 74 to SEQ ID No. 105) also form part of 
the present invention. 

In another aspect, the invention comprises a method for screening cDNA or 
genomic DNA libraries, or for cloning isolated genomic or cDNA encoding spastin, 
20 characterized in that it uses a nucleic acid sequence according to the invention. 
Among these methods, mention may be made in particular of : 
- the screening of cDNA libraries and the cloning of the isolated cDNAs (Sambrook et 
al., 1989; Suggs et al., 1981; Woo et al., 1979), using the nucleic acid sequences 
according to the invention; 
25 - the screening of genomic libraries, for example of BACs (Chumakov et al., 1992; 
Chumakov et al., 1995), and, optionally, a genetic analysis by FISH (Cherif et al., 
1990), using sequences according to the invention, enabling the isolation and 
chromosomal localization, and then the complete sequencing, of the SPG4 gene 
encoding spastin. 

30 In particular, these methods according to the invention may be used for 

identifying and thus obtaining the genomic sequence or the cDNA of the SPG4 gene in 
other mammals, in particular mice. 

These screening and/or cloning methods will comprise, in particular, a step of 
hybridization of a nucleic acid according to the invention with a nucleic acid contained 

35 in a genomic or cDNA library. 
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The invention also comprises a method for identifying the nucleic acid 
sequences which promote and/or regulate the expression of the SPG4 gene of 
sequence SEQ ID No. 1, characterized in that it uses a nucleic acid according to the 
invention. 

5 The computer tools available to those skilled in the art enable them to easily 

identify, using the genomic nucleic acid sequences according to the invention, the 
promoter regulatory boxes required and sufficient for controlling gene expression, in 
particular the TATA, CCAAT and GC boxes, and also the stimulatory regulatory 
sequences ("enhancers"), or inhibitory regulatory sequences ("silencers"), which 

10 control, in CIS, the expression of the genes according to the invention; among these 
regulatory sequences, mention should be made of IRE, MRE and CRE. 

The invention also relates to the methods for identifying mutations carried by 
the human SPG4 gene, in particular mutations responsible for autosomal dominant 
hereditary spastic paraplegia, characterized in that they use a nucleic acid sequence 

1 5 according to the invention. 

These methods for identifying these mutations will, in particular, comprise the 
following steps: (i) isolation of the DNA from the biological sample to be analyzed, or 
production of a cDNA from the mRNA of the biological sample; (ii) specific amplification 
of the target DNA likely to have a mutation, using primers according to the invention; 

20 (iii) analysis of the amplification products, in particular the size and/or the sequence of 
the amplification products, with respect to a reference sequence. 

The expression "methods for identifying a mutation according to the invention" 
is also intended to refer to a method which makes it possible to obtain the nucleic acid 
on which said mutation has been identified. 

25 The promoter and/or regulatory sequences of the SPG4 gene according to the 

invention having mutations which may modify the expression of the corresponding 
protein also form part of the invention. 

The nucleic acids characterized in that they can be obtained using one of the 
preceding methods according to the invention, or the nucleic acids capable of 

30 hybridizing, under high stringency conditions (homology of at least 80% between one of 
the two sequences and the sequence complementary to the other), with said nucleic 
acids, form part of the invention, especially the variant or homologous nucleic acids, in 
particular the nucleic acid sequences of allelic variants of the SPG4 gene of sequence 
SEQ ID No. 1 or of its cDNA of sequence SEQ ID No. 2, and also the genomic 

35 sequences of the homologous genes of other mammals such as mice. 
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In the present description, the term "Spg4" will be intended to refer to the 
mouse gene homologous to the human SPG4 gene. 

The use of a nucleic acid sequence according to the invention as a probe or 
primer for screening a genomic library or a cDNA of course forms part of the subject of 
5 the present invention. 

In another aspect, the invention comprises a purified or isolated polypeptide 
encoded by a nucleic acid according to the invention, preferably with the exception of 
the 584 amino acid peptide, the sequence of which is identified in the GenBank 
database under the accession number AB029006. 
10 In the present description, the term "polypeptide" will be used to refer equally to 

a protein or a peptide. 

Preferably, the present invention relates to a polypeptide according to the 
invention, characterized in that it comprises an amino acid sequence chosen from the 
following group: 

15 -the sequence SEQ ID No. 3, corresponding to human spastin encoded by the 
sequence SEQ ID No. 2 of the cDNA of the human SPG4 gene; 

- the sequence SEQ ID No. 73, corresponding to a fragment of murine spastin encoded 
by the sequence SEQ ID No. 72 of the incomplete cDNA of the mouse Spg4 gene, 
the sequence SEQ ID No. 73 is represented in Figure 4A, "SPAST_MOUSE" line; 

20 -the sequence SEQ ID No. 107, corresponding to murine spastin encoded by the 
sequence SEQ ID No. 106 of the complete cDNA of the mouse Spg4 gene; 

- the sequences of polypeptides which are homologs and variants of the polypeptide of 
sequence SEQ ID No. 3, SEQ ID No. 73 or SEQ ID No. 107; and 

- the sequences of the fragments thereof of at least 8, 10, 15, 30 or 50 consecutive 
25 amino acids. 

Also preferably, a subject of the invention is a polypeptide according to the 
invention, characterized in that it comprises an amino acid sequence chosen from the 
group comprising: 

a) the sequence SEQ ID No. 3, the sequence SEQ ID No. 73, the sequence SEQ ID 
30 No. 107 or the sequence of at least 10 consecutive amino acids of one of these 

sequences; and 

b) the sequences which are homologs or variants of the sequences SEQ ID No. 3, 
SEQ ID No. 73 or SEQ ID No. 107. 

Also preferably, a subject of the invention is a polypeptide according to the 
35 invention, characterized in that it comprises the sequence of at least 8, preferably of at 
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least 10, 15, 20 or 30, consecutive amino acids of the sequence of the aa 197-228, 
ends inclusive, fragment of the sequence SEQ ID No. 3. 

Also preferably, a subject of the invention is a polypeptide according to the 
invention, characterized in that it comprises an amino acid sequence chosen from the 
5 following group: 

- the sequence SEQ ID No. 3, the sequence SEQ ID No. 73 and the sequence SEQ ID 
No. 107, which sequences carrying at least one of the mutations corresponding to a 
natural polymorphism in humans, in particular those the nature and location of which 
are identified in Table 5 hereinafter, or those which may be identified using the 

10 methods for identifying mutations of the SPG4 gene, according to the present 
invention; and 

- the sequences of the fragments thereof of at least 8, 10, 15, 30 or 50 consecutive 
amino acids. 

It should be understood that the invention does not relate to polypeptides in 
15 natural form, i.e. they are not taken in their environment. Specifically, the invention 
relates to the peptides which are obtained by purification from natural sources, or 
obtained by genetic recombination or by chemical synthesis, and which can therefore 
include unnatural amino acids. The production of a recombinant polypeptide, which can 
be carried out using one of the nucleotide sequences according to the invention, is 
20 particularly advantageous since it makes it possible to obtain an increased degree of 
purity of the desired polypeptide. 

The term "homologous polypeptide" will be intended to refer to the polypeptides 
which have certain modifications with respect to the reference polypeptide, such as in 
particular one or more deletions or truncations, an extension, a chimeric fusion and/or 
25 one or more substitutions, and the amino acid sequence of which shows at least 80%, 
preferably 90% or 95%, identity after alignment, with the reference amino acid 
sequence. 

The term "variant polypeptide" (or protein variant) will be intended to refer to the 
set of polypeptides encoded by the variant nucleic acid sequences as defined above. 
30 In particular, the variant polypeptides will comprise any polypeptide which is 

encoded by the mutated genomic sequence of the SPG4 gene of sequence SEQ ID 
No. 1, and the amino acid sequence of which has at least one mutation corresponding 
in particular to a truncation, deletion, substitution and/or addition of amino acid residues 
with respect to the sequence SEQ ID No. 3. In the present case, the variant 
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polypeptides having at least one mutation will be iinked to the pathologies of AD-HSP 
type. 

The term "variant polypeptide" will also be intended to refer to any polypeptide 
resulting from mutation of a splice site in the genomic nucleic acid sequence SEQ ID 
5 No. 1. 

The invention also comprises the cloning and/or expression vectors containing 
a nucleic acid sequence according to the invention. 

The vectors according to the invention, characterized in that they include the 
elements which allow the expression and/or the secretion of said sequences in a host 
10 cell, or a cellular addressing sequence, also form part of the invention. 

The vectors characterized in that they include a promoter and/or regulator 
sequence according to the invention also form part of the invention. 

Said vectors will preferably include a promoter, translation initiation and 
termination signals, and also suitable regions for regulating the transcription. They 
15 should be able to be maintained stably in the cell and can, optionally, have particular 
signals which specify secretion of the translated protein. 

These various control signals are chosen as a function of the host cell used. To 
this effect, the nucleic acid sequences according to the invention can be inserted into 
vectors which replicate autonomously in the host chosen, or vectors which integrate in 
20 the host chosen. 

Among the systems which replicate autonomously, use will preferably be made, 
as a function of the host cell, of the systems of plasmid or viral type, the viral vectors 
possibly in particular being adenoviruses (Perricaudet et a!., 1992), retroviruses, 
lentiviruses, poxviruses or herpesviruses (Epstein et al., 1992). Those skilled in the art 
25 know the technology which can be used for each of these systems. 

When integration of the sequence into the chromosomes of the host cell is 
desired, use may be made, for example, of the systems of plasmid or viral type; such 
viruses will, for example, be retroviruses (Temin, 1986), or AAVs (Carter, 1993). 

Among the nonviral vectors, preference is given to naked polynucleotides such 
30 as naked DNA or naked RNA according to the technique developed by the company 
VICAL, yeast artificial chromosomes (YAC) for expression in yeast, mouse artificial 
chromosomes (MAC) for expression in murine cells and, preferably, human artificial 
chromosomes (HAC) for expression in human cells. 

Such vectors will be prepared according to the methods commonly used by 
35 those skilled in the art, and the clones resulting therefrom can be introduced into a 
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suitable host using standard methods, such as for example lipofection, electroporation 
or heat shock. 

The invention also comprises the host cells, in particular the eukaryotic and 
prokaryotic cells, transformed with the vectors according to the invention, and also the 
5 transgenic animals, except humans, comprising one of said transformed cells 
according to the invention. 

Among the cells which can be used for these purposes, mention may of course 
be made of bacterial cells (Olins and Lee, 1993), but also yeast cells (Buckholz, 1993), 
as well as animal cells, in particular cultures of mammalian cells (Edwards and Aruffo, 
10 1993), and especially Chinese hamster ovary (CHO) cells, but also insect cells in which 
it is possible to use processes implementing baculoviruses, for example (Luckow, 
1993). A preferred cellular host for expressing the proteins of the invention consists of 
CHO ceils. 

Among the mammals according to the invention, preference will be given to 
15 animals such as mice, rats or rabbits, expressing a polypeptide according to the 
invention. 

Among the mammals according to the invention, preference will also be given to 
those comprising a transformed cell characterized in that the sequence of at least one 
of the two alleles of the SPG4 gene contains at least one of the mutations 
20 corresponding to a natural polymorphism in humans, in particular those the nature and 
location of which are identified in Table 5 hereinafter, or those which may be identified 
using the methods for identifying a mutation of the SPG4 gene, according to the 
present invention. 

Among the mammals according to the invention, preference will also be given to 
25 animals such as mice, rats or rabbits, characterized in that the gene encoding spastin 
according to the invention is not functional or is knocked out. 

Among the animal models more particularly advantageous herein, there are, in 
particular: 

- the transgenic animals having, at least in one of their two allelic sequences of the 
30 SPG4 gene, at least one of the mutations the position and nature of which are 
identified in Table 5 or identified using a method according to the present invention. 
These transgenic animals are obtained, for example, by homologous recombination 
on embryonic stem cells, transfer of these stem cells to embryos, selection of the 
chimeras affected in the reproductive lines, and growth of said chimeras; 
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- the transgenic animals (preferably mice) overexpressing the SPG4 gene into which 
one of said mutations according to the invention may be introduced. The mice are 
obtained, for example, by transfection of a copy of this gene under the control of a 
strong promoter which is ubiquitous in nature or selective for a tissue type, or after 

5 viral transcription; 

- the transgenic animals (preferably mice) made deficient for the SPG4 gene according 
to the invention by inactivation using the LOXP/CRE recombinase system (Rohlmann 
et al., 1996) or any other system for inactivating the expression of this gene. 

The cells and mammals according to the invention can be used in a method for 

10 producing a polypeptide according to the invention, as described below, and can also 
be used as a model for analysis and for DNA (genomic or cDNA) library screening. 

The transformed cells or mammals as described above can thus be used as 
models in order to study the interactions between the polypeptides according to the 
invention, and chemical or protein compounds, which are involved directly or indirectly 

15 in the activities of the polypeptides according to the invention, this being in order to 
study the various mechanisms and interactions which come into play. 

They can especially be used for selecting products which interact with the 
polypeptides according to the invention, in particular human spastin of sequence SEQ 
ID No. 3 or the variants thereof according to the invention, as a cofactor or as an 

20 inhibitor, in particular a competitive inhibitor, or which have agonist or antagonist 
activity for the activity of the polypeptides according to the invention. Preferably, said 
transformed cells or transgenic animals will be used as a model which, in particular, 
enables the selection of products which make it possible to combat the pathology 
linked to the SPG4 gene mentioned above. 

25 The invention also relates to the use of a cell, of a mammal or of a polypeptide 

according to the invention for screening a chemical or biochemical compound which 
can interact directly or indirectly with the polypeptides according to the invention, 
and/or which is capable of modulating the expression or the activity of these 
polypeptides. 

30 The invention also relates to the use of a nucleic acid sequence according to 

the invention for synthesizing recombinant polypeptides. 

The method for producing a polypeptide of the invention in recombinant form is, 
itself, included in the present invention, and is characterized in that the transformed 
cells, in particular the cells or mammals of the present invention, are cultured under 

35 conditions which allow the expression of a recombinant polypeptide encoded by a 
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nucleic acid sequence according to the invention, and in that said recombinant 
polypeptide is recovered. 

The recombinant polypeptides, characterized in that they can be obtained using 
said production method, also form part of the invention. 
5 The recombinant polypeptides obtained as indicated above can be in both 

glycosylated and nonglycosylated form and may or may not have the natural tertiary 
structure. 

These polypeptides can be produced based on the nucleic acid sequences 
defined above, according to the techniques for producing recombinant polypeptides 
10 known to those skilled in the art. In this case, the nucleic acid sequence used is placed 
under the control of signals which allow its expression in a cellular host. 

An effective system for producing a recombinant polypeptide requires a vector 
and a host cell according to the invention. 

These cells can be obtained by introducing into host cells a nucleotide 
15 sequences inserted into a vector as defined above, and then culturing said cells under 
conditions which allow the replication and/or expression of the transfected nucleotide 
sequence. 

The processes for purifying a recombinant polypeptide which are used are 
known to those skilled in the art. The recombinant polypeptide can be purified from cell 
20 lyzates and extracts and/or from the culture medium supernatant, with methods used 
individually or in combination, such as fractionation, chromotography methods, 
immunoaffinity techniques using specific monoclonal or polyclonal antibodies, etc. 

The polypeptides according to the present invention can be obtained by 
chemical synthesis, this using one of the many known peptide syntheses, for example 
25 the techniques which implement solid phases or techniques which use partial solid 
phases, by condensation of fragments or by conventional synthesis in solution. 

The solid-phase synthesis technique is well known to those skilled in the art. 
See in particular Stewart et al. (1984) and Bodansky (1984). 

The polypeptides which are obtained by chemical synthesis and which can 
30 include corresponding unnatural amino acids are also included in the invention. 

The mono- or polyclonal antibodies or their fragments, chimeric antibodies or 
immunoconjugates, characterized in that they are capable of specifically recognizing a 
polypeptide according to the invention, form part of the invention. 

Specific polyclonal antibodies can be obtained from a serum of an animal 
35 immunized against the polypeptides according to the invention, in particular produced 
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by genetic recombination or by peptide synthesis, according to conventional 
procedures. 

The advantage of antibodies which specifically recognize certain polypeptides, 
variants or immunogenic fragments thereof, according to the invention, will in particular 
5 be noted. 

The specific monoclonal antibodies can be obtained according to the 
conventional hybridoma culture method described by Kohler and Milstein, 1975. 

The antibodies according to the invention are, for example, chimeric antibodies, 
humanized antibodies, or Fab or F(ab')2 fragments. They can also be in the form of 
10 labeled antibodies or immunoconjugates in order to obtain a detectable and/or 
quantifiable signal. 

The invention also relates to methods for detecting and/or purifying a 
polypeptide according to the invention, characterized in that they use an antibody 
according to the invention. 

15 The invention also comprises purified polypeptides, characterized in that they 

are obtained using a method according to the invention. 

Moreover, besides their use for purifying the polypeptides, the antibodies of the 
invention, in particular the monoclonal antibodies, can also be used for detecting these 
polypeptides in a biological sample. 

20 They thus constitute a means of immunocytochemically or immuno- 

histochemically analyzing the expression of the polypeptides according to the 
invention, in particular the polypeptide of sequence SEQ ID No. 3 or a variant thereof, 
on specific tissue sections, for example by immunofluorescence or gold labeling, or 
with an enzymatic immunoconjugates. 

25 They may make it possible, in particular, to demonstrate abnormal expression 

of these polypeptides in the biological samples or tissues, which makes them useful for 
monitoring the progression of the disease and the molecular diagnosis. 

More generally, the antibodies of the invention can be advantageously used in 
any situation in which the expression of a normal or mutated polypeptide according to 

30 the invention must be observed. 

The methods for determining allelic variability, a mutation, a deletion, a loss of 
heterozygosity or any genetic abnormality of the SPG4 gene, according to the 
invention, characterized in that they use a nucleic acid sequence or an antibody 
according to the invention, also form part of the invention. 
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The present invention thus comprises a method for genotypic diagnosis of the 
pathology associated with the SPG4 gene, characterized in that a nucleic acid 
sequence according to the invention is used. 

Preferably, the invention relates to a method for genotypic diagnosis of the 
5 disease associated with the presence of at least one mutation on a sequence of the 
SPG4 gene, using a biological sample from a patient, characterized in that it includes 
the following steps: 

a) where appropriate, isolation of the genomic DNA from the biological sample to be 
analyzed, or production of cDNA from the RNA of the biological sample; 
10 b) specific amplification of said DNA sequence of the SPG4 gene likely to contain a 
mutation, using primers according to the invention; 
c) analysis of the amplification products obtained and comparison of their sequence 
with the corresponding normal sequence of the SPG4 gene. 

The invention also comprises a method for diagnosing the disease associated 
15 with abnormal expression of a polypeptide encoded by the SPG4 gene, in particular the 
polypeptide of sequence SEQ ID No. 3, characterized in that one or more antibodies 
according to the invention is (are) brought into contact with the biological material to be 
tested, under conditions which allow the possible formation of specific immunological 
complexes between said polypeptide and said antibody or antibodies, and in that the 
20 immunological complexes possibly formed are detected and/or quantified. 

These methods are, for example, directed toward the methods for diagnosis, in 
particular antenatal diagnosis, of AD-HSP associated with the presence of a mutation 
in the SPG4 gene, according to the invention, by determining, using a biological 
sample from the patient, the presence of mutations in at least one of the sequences 
25 described above. The nucleic acid sequences analyzed may equally be genomic DNA, 
cDNA or mRNA. 

Nucleic acids or antibodies based on the present invention may also be used to 
enable positive diagnosis in a patient or presymptomatic diagnosis in an individual at 
risk, in particular an individual with a family history of the disease. 
30 There are, of course, a great number of methods which make it possible to 

demonstrate a mutation in a gene with respect to the wild-type gene. They can 
essentially be divided into two main categories. The first type of method is that in which 
the presence of a mutation is detected by comparing the mutated sequence with the 
corresponding wild-type sequence, and the second type is that in which the presence 
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of the mutation is detected indirectly, for example through evidence of mismatches due 
to the presence of the mutation. 

These methods can use the probes and primers of the present invention which 
have been described. They are generally purified nucleic acid hybridization sequences 
5 comprising at least 15 nucleotides, preferably 20, 25 or 30 nucleotides, characterized in 
that they can hybridize specifically with a nucleic acid sequence according to the 
invention. 

Preferably, the specific hybridization conditions are such as those defined 
above or in the examples. The length of these nucleic acid hybridization sequences 
10 can range from 15, 20 or 30 to 200 nucleotides, particularly from 20 to 50 nucleotides. 

Among the methods for determining allelic variability, a mutation, a deletion, a 
loss of heterozygocity or a genetic abnormality, preference is given to the methods 
comprising at least one so-called PCR (polymerase chain reaction) or PCR-like 
amplification step for the target sequence according to the invention likely to have an 
15 abnormality, using a pair of primers having nucleotide sequences according to the 
invention. The amplified products may be treated with a suitable restriction enzyme 
before carrying out the detection and assaying of the product targeted. 

The mutations of the SPG4 gene according to the invention may be responsible 
for various modifications of the translation product thereof, these modifications possibly 
20 being used for a diagnostic approach. Specifically, the antigenicity modifications linked 
to these mutations may allow the development of specific antibodies. The mutated 
gene product can be distinguished using these methods. All these modifications can be 
employed in a diagnostic approach, using several well-known methods based on the 
use of mono- or polyclonal antibodies which recognize the normal polypeptide or 
25 mutated variants, such as for example by RIA or by ELISA. 

Thus, a subject of the invention is also a kit or pack for diagnosis, in particular 
for diagnosing AD-HSP associated with the presence of a mutation in the SPG4 gene, 
according to the invention, characterized in that it comprises at least one compound 
chosen from the following group of compounds: 
30 a) a nucleic acid, in particular as a primer or probe, according to the present invention; 
and 

b) an antibody according to the invention. 

In another aspect, the invention comprises a method for selecting a chemical or 
biochemical compound capable of preventing and/or treating AD-HSP associated with 
35 the SPG4 gene, characterized in that a nucleic acid sequence according to the 
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invention, a polypeptide according to the invention, a vector according to the invention, 
a ceil according to the invention, a mammal according to the invention or an antibody 
according to the invention is used. 

The methods for selecting chemical or biochemical compounds capable of 
5 interacting directly or indirectly with polypeptides according to the invention or with the 
nucleic acids according to the invention, and/or making it possible to modulate the 
expression or the activity of these polypeptides, characterized in that they comprise 
bringing a polypeptide according to the invention, a transformed cell according to the 
invention or a mammal according to the invention into contact with a candidate 

10 compound, and detecting a modification of the activity of said polypeptide, are also 
included in the invention. 

For example, but without being limited thereto, mention may be made of a 
method for identifying molecules capable of interacting with a polypeptide according to 
the invention, using a bacterial or yeast two hybrid system such as the Matchmaker 

15 Two Hybrid System 2, according to the instructions of the manual which is supplied 
with the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech). 

The nucleic acids encoding proteins which interact with the promoter and/or 
regulatory sequences of the SPG4 gene, according to the invention, can be screened 
and/or selected using a one hybrid system such as that described in the manual which 

20 is supplied with the Matchmaker One Hybrid System kit from Clontech (Catalog No. 
K1603-). 

In other aspect, the invention comprises the use of a nucleic acid or of a 
polypeptide according to the invention, of a vector according to the invention, of a cell 
according to the invention or of a mammal according to the invention, for studying the 
25 expression or the activity of the SPG4 gene. 

Other characteristics and advantages of the invention appear in the remainder 
of the description with the examples and figures, the legends of which are given 
hereinafter. 

30 LEGENDS OF THE FIGURES 

FIGURES 1A, 1B and 1C : Physical map of the SPG4 range and genomic organization 
of SPG4. 

FIGURE 1A : The 1.5 Mb candidate region is delimited by the D2S352 and 
D2S2347 genetic markers indicated in bold characters. The position of the polymorphic 
35 markers and other STSs is indicated in standard characters, whereas the position of 
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the ESTs is indicated in italics. The BAC clones constituting the presequencing map 
are represented by rectangles, with the name shown above and the precise size of the 
clone, if it could be determined, shown below. The name of the BACs A, B, C, etc. is 
followed by brackets containing the name of the clone preceded by a "b" if the clone is 
5 derived from the BACs library CITB_978_SKB, or by a "B" if it originates from the 
library RPCI-11. 

FIGURE 1B : Schematic representation of the SPG4 gene which overlaps 
BACs D (b336P14) and G (B563N4). The exons are shown as black rectangles with 
their name above. 

10 FIGURE 1C : The five mutations identified in seven SPG4 locus-linked AD-HSP 

families are positioned in exons 7, 11 and 13 and in the splice acceptor site of intron 
15. 

FIGURE 2 : Nucleic acid and protein sequence of the SPG4 cDNA of spastin. 

The 17 vertical bars with a number located below represent the junctions 

15 between the various exons. The ATG initiator codon is located at nt position 126-128 
and the STOP codon for termination is located at nt position 1974-1976. Five of the 
mutations identified to date, including the loss of exon 16, are indicated in italics 
(nt 1210, nt 1468, nt 1520, nt 1620 and for the loss of exon 16: nt 1813-1853). The 
polyadenylation site is in italics and underlined. The putative nuclear localization signal 

20 (NLS), RGKKK, and also the three conserved domains predicted by the analysis in the 
ProDom database are located at aa positions 7-11 (NLS), 342-409 (domain 92), 
411-509 (domain 179) and 512-599 (domain 6226), respectively. The four motifs 
predicted by the sequence comparison in the Prosite database are: two "leucine 
zipper" motifs at aa positions 50-78 and 508-529, the ATP binding site (or Walker A 

25 motif) at aa positions 382-389 and the "helix-loop-helix" dimerization domain at aa 
positions 478-486. The Walker A and B motifs, "GPPGNGKT" and "I I FIDE", and also 
the AAA minimum consensus [lacuna] are underlined. 

FIGURES 3A and 3B : Characterization of a splice site mutation in the affected 
individuals of three SPG4 locus-linked AD-HPS families. 
30 FIGURE 3A : PCR amplification of fragment IV of the SPG4 cDNA using 

lymphoblast cDNA: well M, size marker VII (Boehringer); well 1, unaffected member of 
family 2992; well 2, patient of family 2992; well 3, unaffected member of family 5330; 
well 4, patient of family 5330; well 5, patient of family 5226; well 6, negative control 
(human genomic DNA). 
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FIGURE 3B : Sequence graph for the mutation of the splice acceptor site of 
intron 15. 

Genomic sequence of the control individual above and of a patient of family 
2992 below. The asterisk at nt position 1813-4 indicates an A->C polymorphism which 
5 affects a nonconserved nucleotide of the splice acceptor site of intron 15 in the patient. 
FIGURES 4A and 4B ; Spastin homologies. 

The identical residues are highlighted by shaded areas. 

FIGURE 4A : Multiple alignment created by CLUSTAL W of eight proteins 
derived from various organisms and having strong sequence homology with human 
10 spastin and murine spastin (SEQ ID No. 73). 

FIGURE 4B : Alignement by CLUSTAL W of the yeast metalloproteases AFG3, 
RCA1 and YME1, and of human plaraplegin and spastin. 

FIGURE 5: Alignment by BLASTN of the nucleic acid sequences of the SPG4 cDNA 
and of its mouse ortholog Spg4 (SEQ ID No. 72). The polyadenylation site of the 
15 murine cDNA is underlined and in italics. The STOP codon is located at nt position 
1515-1517 in the murine cDNA and at nt position 1974-1976 in the human cDNA. 
FIGURES 6A, 6B and 6C : PCR analysis of the expression of SPG4 and of its murine 
ortholog Spg4. 

FIGURE 6A : Collection of cDNA originating from multiple mouse tissues. 
20 Well M, size marker V (Boehringer); well 1, heart, well 2, brain; well 3, spleen; 

well 4, lung; well 5, liver; well 6, skeletal muscle; well 7, kidney; well 8, testicle; well 9, 
E7 7-day embryo; well 10, E11 11-day embryo; well 11, E15 15-day embryo; well 12, 
E17 17-day embryo; well 13, negative control (mouse genomic DNA). 

FIGURE 6B : Collection of cDNA originating from multiple human tissues. 
25 Well M, size marker VII (Boehringer); well 1, brain; well 2, heart; well 3, kidney; 

well 4, liver; well 5, lung; well 6, pancreas; well 7, placenta; well 8, skeletal muscle, 
well 9, negative control (human genomic DNA); well 10, negative control (no DNA). 

FIGURE 6C : Collection of cDNA originating from multiple human fetal tissues. 
Well M, size marker VII (Boehringer); well 1, brain; well 2, heart; well 3, kidney; 
30 well 4, liver; well 5, lung; well 6, skeletal muscle; well 7, spleen; well 8, thymus; well 9, 
negative control (human genomic DNA); well 10, negative control (no DNA). 
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EXAMPLES 

Example 1: Materials and methods 

1 ) Subcloning and sequencing of the candidate region 

Twelve BACs originating from two human genomic libraries, CITB_978_SKB 
5 (sold by Research Genetics) and RPCI-11 (Osoegawa et al., 1998), and covering the 
SPG4 range, were selected to be sequenced (Hazan et al., Genomics, 60 (3), 309-19, 
1999). 40 ug of the DNA of each BAC were partially digested with the CviJI restriction 
enzyme (CHIMERx) and separated by electrophoresis on 0.4% LMP agarose gel 
(FMC). DNA fractions, the sizes of which vary in the region of 3, 5 and 10 kb, were 

10 eluted with p-agarase (Biolabs) and ligated to a plasmid vector pBAM3, which had 
been digested with Smal and dephosphorylated, beforehand, in a ratio of 1 x insert per 
5 x vector. Electrocompetent E. coli DH10B bacteria (GIBCO-BRL) were transformed 
with the various ligations, by electroporation. Approximately 1 000 to 1 500 subclones 
per BAC (8 to 10 equivalent genomes), consisting of 20% of clones with inserts at 

15 10 kb, 40% of clones with inserts at 5 kb and 40% of clones with inserts at 3 kb, were 
isolated. The ends of the inserts of these clones were sequenced on a LiCOR 4200 
automatic sequencer. For each BAC, the sequences were assembled into a backbone 
consisting of several contigs, using the Phred and Phrap programs. The holes between 
each contig were sequenced with labeled dideoxynucleotides on an ABI 377 sequencer 

20 (PE-Applied Biosystems). The exons contained in these sequence contigs were 
predicted with the GRAIL II, GENSCAN, FGENEH and Genie computer programs. The 
sequences were also compared in the EMBL and GenBank nucleic acid and protein 
databases, with the BLASTN and BLASTX programs. The determination of the 
promoter sequences was carried out using the TSSG and TSSW computer programs. 

25 The results of all these sequence analyses were visualized using the Genotator 
sequence annotation program. 

2) cDNA cloning 

The cDNA of the SPG4 gene was isolated through 5' and 3' RACE-PCR 
experiments on polyA+ RNAs of fetal brain, adult brain and adult liver, using the 
30 Marathon cDNA amplification kit (Clontech) according to the supplier's instructions. A 
first PCR followed by an internal PCR were carried out with various pairs of primers, 
the sequences of which are indicated in Table 1 hereinafter: 
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Table 1 

Primers used for the RACE-PCRs and the cDNA amplifications 



Primer Sequence (5'-3') 5' position pair/PCR product size 



SPA. 


5RACE5 


SPA. 


.5RACE6 


SPA. 


.5RACE7 


SPA. 


.3RACE1 


SPA. 


3RACE2 


SPA. 


_3RACE3 


SPA. 


3RACE4 


SPA. 


.3RACE5 


SPA. 


.3RACE6 


SPA. 


_3RACE7 


SPA. 


3RACE8 



CGGAGCTCCTCTTGGCTGCCATG (SEQ ID No.4) nt 405 

AGAAGCGCTGGCAGAGCCACACGAAG (SEQ ID No.5) nt 372 

AAGGCGACCAAACGCAGCAGCGCGAAG (SEQ ID No.6) nt 331 

AGGAGCAAGCTGTGGAATGGTATAAG (SEQ ID No.7) nt 550 

TGGTTATGGCCAAGGACCGCTTACAAC (SEQ ID No.8) nt 689 

CAAACGGACGTCTATAATGACAGTAC (SEQ ID No.9) nt 747 

TTAGGAATGTGGACAGCAACCTTGC (SEQ ID No.10) nt 1075 

CTTCTCTGAGGCCTGAGTTGTTCAC (SEQIDNo.11) nt 1207 

TG CTAGAATG ACTGATGG ATACTCAGG (SEQ ID No.1 2) nt 1 736 

AG ATG CAGCACTGGGTCCTATCCG (SEQ ID No.1 3) nt 1 787 

ATG AACGTCATCG G CTACAG AAACAG (SEQ ID No.14) nt 2037 



SPA. 


_Db 


TAGCAGTGGCTGCCGCCGT (SEQ ID No.1 5) 


nt 45 


b+m 


655 bp 


SPA. 


_Dm 


AAGCGGTCCTTGGCCATAAC (SEQ ID No.1 6) 


nt 700 






SPA. 


_Dc 


GGCGGCAGTGAGAGCTGTG (SEQ ID No.1 7) 


nt106 


c+n 


543 bp 


SPA. 


_Dn 


CTAGCTCTTTCACACTGTTC (SEQ ID No.18) 


nt 649 






SPA. 


_Ad 


AACAGGCCTTCGAGTACATC (SEQ ID No.1 9) 


nt 487 


d+n 


746 bp 


SPA. 


.Am 


CTGTGAACAACTCAGGCCTC (SEQ ID No.20) 


nt 1233 






SPA. 


.Ac 


ATGAGAAAGCAGGACAGAAG (SEQ ID No.21) 


nt 532 






SPA. 


.An 


TGCCAAGTCTTGACCAGC (SEQ ID No.22) 


nt1175 






SPA. 


_Ba 


CTACAACTGCTACTCGTAAG (SEQ ID No.23) 


nt1036 


a+m 


763 bp 


SPA. 


_Bm 


CAGTGCTGCATCTTTTGCC (SEQ ID No.24) 


nt 1799 






SPA. 


_Bb 


TAGGAATGTGGACAGCAACC (SEQ ID No.25) 


nt 1076 






SPA. 


_Bn 


A AAG CTGTTAG GTCACTTCC (SEQ ID No.26) 


nt 1780 






SPA. 


_Ca 


TGGAGATGACAGAGTACTTG (SEQ ID No.27) 


nt 1550 


a+m 


766 bp 


SPA. 


.Cm 


CTGGAATACTTTCATCTGC (SEQ ID No.28) 


nt 2316 






SPA. 


_Cb 


ATGAGGCTGTTCTCAGGCG (SEQ ID No.29) 


nt 1603 
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The RACE-PCR products were cloned with the TA-cloning kit (Invitrogen) and 
the corresponding clones were sequenced on an ABI 377 (PE-Applied Biosystems). 
The sequence of the SPG4 transcript was varified by sequencing PCR products 
amplified from a cDNA population originating from the lymphoblasts of 6 healthy 
5 individuals. 

3) Detection of mutations 

The total RNAs were extracted from lymphoblast lines of one affected individual 
per family studied and of 6 control individuals, using the RNA PLUSR kit (bioprobe 
System). The cDNA synthesis was carried out on 500 ng to 1 ug of RNA, with 100 pmol 

10 of random hexameric primers (Pharmacia) and 200 units of Superscript II reverse 
transcriptase (Gibco BRL), under standard conditions. Four PCR amplifications, 
generating overlapping fragments which cover all of the SPG4 open reading frame, 
were carried out on the cDNAs of the patients and controls. Fragment I was amplified 
with the SPA_Db/SPA_Dm primers, and then by internal PCR with the 

15 SPA_Dc/SPA_Dn primers. Fragments II, III, and IV were amplified with the 
SPA_Ad/SPA_Am, SPA_Ba/SPA_Bm and SPA_Ca/SPA_Cm primers (cf. the 
sequences of these primers in Table 1), respectively. Each amplification was carried 
out in a total volume of 50 pi containing 4 pi of cDNA (~ 1/7th of the prep.), 20 pmol of 
each primer, 200 pM of dNTPs, 50 mM of KCI, 10 mM of Tris, pH 9, 1.5 mM MgCI 2 , 

20 0.1% of triton X-100, 0.01% of gelatin and 2.5 units of Taq polymerase (Cetus-PE). The 
PCR reactions were carried out according to the "hot start" process: the Taq 
polymerase is added at 92°C, after a first denaturation step of 5 min at 94°C. The 
samples are subsequently subjected to 35 cycles of denaturation (94°C for 40 sec), of 
hybridization <55°C for 50 sec, with the exception of fragment I: 58°C for 50 sec) and of 

25 elongation (72°C for 1 min), followed by a final elongation step (5 min at 72°C). The 
PCR products are sequenced on an ABI 377 automatic sequencer (PE-Applied 
Biosystems), with the SPA_Dc/SPA_Dn, SPA_Ac/SPA_An, SPA_Bb/SPA_Bn and 
SPA_Cb/SPA_Cm primers for fragments I, II, III and IV, respectively. 

The mutations were also sought or confirmed by sequencing the 17 predicted 

30 exons of the SPG4 gene in the patients and controls. Each exon was amplified with the 
corresponding "a+m" pair of primers (cf. Table 2 hereinafter), with the exception of 
exon 1 (gSPAex1c/gSPAex1m), and exons 10, 11 and 12 which were co-amplified with 
the gSPAex10a/gSPAex12m and gSPAex11a/gSPAex12m pairs of primers. 
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Table 2 

PCR primers for amplifying and sequencing the exons 



Exon 


Product size 


PCR program 


Primer 


Sequence (5'-3') (SEQ ID Nos.; 30 to 71) 


1 


1048 bp 


o 


n5> PA*=>y1r 


PTPi Ad P P P A A PTP P A P A TTP 








aSPApxIm 


PAAAP1TPP1APAPPTAPAPTPP 








hSPApyIH 


Pi Pi A A PTPT A P TTP A PTP P P A 








h^PAoyI n 
yOrnCA 1 1 1 


APATPAPPPTPPP APPTAP 
MOM 1 OMOOO 1 OOvjMOO i MO 


2 


624 bp 


3 


y O ~ /ACA^la 


A ATPPPAP APTTnTA ATPTP 
MM 1 UOvnuHU 1 lol MM 








nQPAnvOm 


TPTPA ATATATP ATA ATTTrPP 
! o 1 oflA 1 A 1 A 1 OA 1 AA 1 1 1 ooo 








gorY\exzD 


1 AOAoOAlj 1 1 O 1 OA 1 CjA 1 (j 


O 


819 hn 


i 


gorAexoa 


bALLAAA t 1 1 OjOA 1 GoA 








gorAexom 


AOA 1 1 1 OOAA 1 ALA 1 OOOAO 


4 


379 bp 


o 




A 1 1 Ibl OA 1 1 1 OAOA 1 (j OAO 








g or Aex4rn 


TTAP A ATP A r*TA "T" A nr^T^ A ^ 

1 1 At?AA 1 (jAO 1 A 1 AOO 1 OAo 








nQDA qv/1 r» 

gorAex4n 


1 UAoo 1 1 AAo 1 AAoAO 1 O 


5 






gor>\exo3 


1 1 OO 1 A 1 O 1 ALU 1 AO 1 (jjAO 








gor>\exom 


III! ATArTAArTTrrPOTP 

III 1 A 1 AbOAAti 1 1 bLLL 1 (j 








gorMcXOD 


OO 1 A 1 bAAbA 1 bt 1 OO 1 AO 


6 


484 bp 


3 


yOrnCAUd 


TPTP ATP ATTPT A ATiiPrr 
i vp 1 OM 1 OM 1 1 O 1 MMUftAwoo 








nSPAexfim 


TCTATTTCACTPPTPiAPATPi 


7 


420 bp 


2 


gSPAex7a 


GTCATAGGGCTTAGGCTTC 








gSPAex7m 


ATCATACTACCCAC 1 1 1 ICC 


8 


647 bp 


3 


gSPAex8a 


TGTTTGGGAAGATGCTACTG 








gSPAex8m 


CTACTGAAGATAACGTACATG 


9 


1268 bp 


1 


gSPAex9a 


CATTGATTGCCATGTATTGG 








gSPAex9m 


AGAAGGCCAGAAATACTCAG 








gSPAex9b 


GTACTTAAATCGGTAAATAT GG 


101 


1061 bp 


4 


gSPAexlOa 


CTCAAGTCTTAGGAATGCAG 


11 | 






gSPAexlOb 


GCACTTAACCAGGCTGTATG 


12j 


551 bp 


3 


gSPAex11a 


CTCAG ATG ACTC ACATAG C 








gSPAex12m 


CTTTACTAGACTAATTCTCCTG 
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13 


1361 bp 


4 


aSPAex13a 


CAGATTPAAGAAPiAPAPiATP 








nSPApx13m 


GPAATAATTPAPPAPAPTTP, 








aSPApx13n 


GGTAGTTP.TTGTTTPTPPTP 


14 


985 bp 


4 


n55PAf=*Yl4a 


P A A PTPTP PTP A ATT A TTP P 








yOrnCA IHIII 


PAPPTPAAAAPTATTPA PP 










TPPAAAPPAPATAPPPAPTP 

1 OV^MMMOOMOM 1 rtUUvnO 1 O 


15 


1076 bp 






A Pi P PTPTP P A P AT A RT ATP P 








gSPAex15m 


CTAGAACAGGGGTCACAGTC 








gSPAex15n 


TTGGACTTCTTAAACTTC 


16 


1404 bp 


4 


gSPAex16a 


G C AGTATG CAAG AAATTG AAC 








gSPAex16m 


GGCCTGTAA 1 1 1 1 CTTCTG 








gSPAex16b 


GTACTGAATAGATACATGTAG 


17 


445 bp 


3 


gSPAex17a 


GTGTAGCAGATCAACATAG 








gSPAex17m 


CATCTTCAAGTTTGGTGCAC 



Other than for exon 1, which is amplified using the Advantage GC genomic 
PCR kit (Clontech) according to the supplier's instructions, four slightly different PCR 
programs (1, 2, 3 and 4) were used to amplify the SPG4 exons (see Table 2). The 
5 amplifications were all carried out in a volume of 50 pi containing 100 ng of genomic 
DNA, 50 pmol of each primer, 250 uM pf dNTPs, 1X Takara buffer and 1 unit of Takara 
La Taq Taq polymerase (Shuzo Co.). The PCR reactions were carried out according to 
the "hot start" process: the Taq polymerase is added at 94°C, after a first denaturation 
step of 5 min at 96°C. The samples are subsequently subjected to 30 cycles of 

10 denaturation (94°C for 40 sec), of hybridization (prog. 1: 60°C for 50 sec; prog. 2: 58°C 
for 50 sec, prog. 3 and 4: 55°C for 50 sec) and of elongation (prog. 1 and 4: 72°C for 
1 min, prog. 2 and 3: 72°C for 40 sec), followed by a final elongation step (10 min at 
72°C). The sequencing of these PCR products was carried out on an ABI 377 
sequencer (PE-Applied Biosystems), using either the PCR primers or the internal 

1 5 primers termed "b" and "n" (see Table 2). 
4) Characterization of SPG4 

The cDNA clones 977312 (EST AA560327) and 568234 (EST AA107866) 
derived from the mouse blastocyst and E8 embryo cDNA libraries, which both 
correspond to the murine ortholog of SPG4, were isolated using the IMAGE consortium 
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and sequenced in the laboratory on an ABI 377 sequencer (PE-Applied Biosystems). In 
order to analyze the expression profile of SPG4 and of its murine ortholog Spg4, the 
collections of cDNA from various fetal and adult human tissues, and also from mouse 
tissues (MTC panels, Clontech), were tested by PCR according to the supplier's 
5 protocol, with the SPA_Ca/SPA_Cm pair of primers for the human cDNAs and the 
SPA_Ca/spam (spam: 5-ACCGAAGTCAAGAGCCTATC-3') pair for the mouse 
cDNAs. The PCR conditions are those used for amplifying SPG4 from lymphoblast line 
cDNA (cf. § Detection of mutations), except that these samples were subjected to 
32 cycles for the cDNAs derived from adult human tissues and from mouse tissues, 
10 and to 28 cycles for the cDNAs derived from fetal tissues. The amplification products 
migrated by electrophoresis on 2% agarose gels. 

5) Histological analysis of a muscle biopsy from a patient 

The histological and histo-enzymatic analyses were carried out on a muscle 
biopsy from a patient derived from an SPG4 locus-linked family according to the 
15 standard techniques described in Casari et al., 1998. 

6) Accession numbers in the public databases 

The SPG4 (or SPAST) cDNA and the deduced protein sequence, 
GenBank/EMBL AJ246001; the incomplete Spg4 cDNA clone, GenBank/EMBL 
AJ246002; the SPG4 (or SPAST) gene, GenBank/EMBL AJ246003. 

20 Example 2 : Analysis of the sequence of the SPG4 range 

The analysis of the recombination events made it possible to reduce the SPG4 
candidate region to a genetic range of 0 cM between the D2S352 and D2S2347 
markers (19, 20). A presequencing map of the SPG4 range composed of 37 BACs was 
constructed (Hazan et al., in press in Genomics); the candidate region covers a 

25 physical distance of approximately of 1.5 Mb. Twelve overlapping BACs, stretching 
over the SPG4 region, with the exception of a single 4 kb hole between clones A and 
E, were selected to be sequenced (fig. 1A). Seven of these BACs (A, B, C, D, E, F and 
G), covering approximately 70% of the region of interest, have already been 
sequenced. The sequences of these 7 BACs were compared with those of the nucleic 

30 acid and protein databases, and analyzed with four exon prediction programs. These 
preliminary sequence analyses made it possible to reveal 14 potential transcription 
units, including three corresponding to the genes encoding xanthine dehydrogenase, 
steroid 5a-reductase 2 and a TGFp-binding protein. Of the 14 genes detected by the 
sequence analysis, 9 had been previously identified in the EST (for "Expressed 

35 Sequence Tag") databases and located in the SPG4 range (Hazan et al., in press in 
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Genomics); the 5 remaining genes could only be identified by sequencing the 
candidate region. One of these 5 novel genes showed homology in 3' of its coding 
region, with the genes encoding the AAA protein family (Confalonieri et al., 1995). More 
thorough sequence analyses showed that this gene, named SPG4 (or SPAST), was 
5 composed of 17 exons and extended over a region of approximately 90 kb, covered by 
two adjacent BAC clones, D and G (cf. fig. 1B). The first three predicted exons of this 
gene were identified in BAC D, by two of the four exon prediction programs used, 
GRAIL II and GENSCAN; they show strong homology with a mouse blastocyst EST, 
AA560327. The last 14 exons are found in BAC G. The protein sequence deduced 

10 from exons 7 to 17 is significantly homologous to a subclass of the AAA family, which 
includes the Yta6p (Schnall et al., 1994), TBP6 (Schnall et al., 1994) and End 13 yeast 
proteins, and also the SKD1 mouse protein (Perier et al., 1994). 

Of the four exon prediction programs FGENEH appears to be the most reliable 
and the most powerful, enabling detection of most of the genes of this chromosomal 

15 region at 2p21-p22. This observation also applies to the SPG4 gene, for which 15 
exons could be demonstrated using this program, while only 4, 9 or 1 1 exons could be 
located using the Genie, GRAIL II and GENSCAN programs, respectively. The 
genomic organization of this gene (fig. 1B) could subsequently be confirmed by 
determining the sequence of the SPG4 cDNA. The intron/exon junctions are 

20 represented on table 3 hereinafter: the exon size ranges from 41 bp (exon 16) to 
1.410 kb (exon 17), that of the introns ranging from 140 bp (intron 11) to 23.247 kb 
(intron 1). 
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Example 3 : Identification of the SPG4 cDNA 

Several successive amplifications by 5' and 3' RACE-PCR were carried out on 
collections of adult liver and brain and fetal brain cDNA, in order to characterize the SPG4 
transcript. All the 5' RACE-PCRs gave amplification products terminating at nt position 
5 263 of the SPG4 cDNA (fig. 2), which was probably due to the rich GC content of the 5" 
region of the transcript (90% of GC in the 60 bp preceding nt position 263). Four 
overlapping PCR products, covering all of the coding region, were amplified from the 
cDNAs derived from the lymphoblasts of six control individuals, and entirely sequenced 
with the aim of verifying the sequence of the SPG4 transcript. Aligning the sequences of 

10 all the PCR and RACE-PCR products made it possible to reconstitute a 3263 bp 
sequence comprising a 1848 bp open reading frame preceded by a 125 bp untranslated 5' 
region (5' UTR for "5' UnTranslated Region") and followed by 1290 bp 3' UTR region 
including a polyadenylation site between nt positions 3227-3232, ~ 35 bp upstream of the 
polyA tail (fig. 2). Comparing the sequence of the SPG4 cDNA with the EST databanks 

15 made it possible to detect significant homology with 6 human ESTs, including 
EST N47973 which contains a more extended 3' noncoding region (+ 180 bp) comprising 
a second polyadenylation site. The translation initiation site was identified by the presence 
of a Kosak consensus sequence (CTGTGAatgA) defined as a "suitable context" for 
translation initiation given that a purine is located 3 nt upstream of the initiator ATG, itself 

20 preceded by a STOP codon. The 3263 bp cDNA sequence is identical to the transcribed 
sequence deduced from the 17 exons of the SPG4 gene. The analysis of the sequence of 
the 5' region using the TSSG and TSSW computer programs suggests the presence of a 
promoter sequence of the TATA box type located 43 bp upstream of nt position 1 of 
exon 1. 

25 Example 4 : Mutations in the SPG4 gene 

Heterozygous mutations were sought in the SPG4 cDNA originating from 
lymphoblasts of 14 patients derived from SPG4 locus-linked families (1 affected individual 
per family). Four overlapping PCR fragments, I, II, III and IV, covering the open reading 
frame of the SPG4 cDNA, were amplified and sequenced in the 14 patients, and also in 6 

30 healthy control individuals. The agarose gel electrophoresis of PCR fragment IV showed 
three bands of equal intensity in 3 patients from families 2992, 5226 and 5330 originating 
from the same region of Switzerland, which would suggest a microdeletion or a mutation 
of a splice site; the two additional bands were not present in 2 healthy individuals derived 
from families 2992 and 5330 (fig. 3A). The genomic sequence of exon 16 revealed a 

35 heterozygous A->G mutation of the splice acceptor site (AG) of intron 15 in the affected 



33 



individuals of these three families (fig. 3B); this mutation engenders the loss of exon 16, 
followed by a reading frame shift in the abnormal transcript. None of the healthy members, 
including husbands and wives, carry this mutation of the splice site. The identification of 
the same mutation in all the affected members of these three Swiss families demonstrates 
the existence of a common ancestor, which had probably been suggested by the study of 
the haplotypes. 

Three point mutations, 1210C->G, 1468G->A and 1620C->T, which introduced 
amino acid substitutions into the protein sequence (S362C, C448Y and R499C), were 
respectively revealed by sequencing PCR fragments III and IV in the affected individuals 
of families 624, 4014 and 618. These three substitutions all involve a cysteine residue, 
inducing the loss or insertion of a cysteine in the protein sequence. A 1 bp deletion, 
1520delT, which creates the appearance of a STOP codon inducing a truncated protein 
composed of 465 amino acids (aa), was detected in the affected individuals of family A. 
None of the five mutations summarized in table 4 hereinafter was found in the control 
individuals tested, whether they belong to the healthy siblings or to the spouses of the 
seven families analyzed herein. These five mutations significantly affect the protein 
sequence in a very conserved domain, or AAA cassette (Beyer, 1997), which is composed 
of several protein motifs presumed to be responsible for the ATPase activity in all the 
members of the AAA family. 
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In addition to these five mutations described above, searches for heterozygous 
mutations, carried out on patients suffering from AD-HSP derived from 36 other families, 
made it possible to reveal 34 other mutations which modified or were likely to modify the 
product of expression of the SPG4 gene. 
5 The characteristics of these 34 other mutations are summarized in table 5 

hereinafter, into which the first five mutations mentioned above have also been inserted. 
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Table 5 



Mutations in SPG4 in the patients suffering from AD-HSP 



Family 


Location 


Mutation 3 


Amino acid change b 


Consequence 


624 


ex on 7 


1/1U t 1 — >Cj 


S362C 




1233 G i->A 


G370R 


mis sense 


6958 
214 


ex on 8 
exon 8 


1267 Th^G 
1283 Th->G 


F381C 
N386K 


111 lOOVllOC 

missense 


1002 


exon 8 


K388R 


missense 


027 


exon 8 


1288 A h-Kj 


L426V 


missense 


mo 


exon 10 


t /in i i 
14UI L, 1— >lj 


C448Y 


missense 


4014 


exon 1 1 


1468 G l— »A 


R460L 


missense 


148 
618 


exon 1 1 


1504Gh>T 


R499C 


missense 


exon 13 


1620Ch-Vr 


D555N 


missense 


OJO 
OZ / 


exon 15 
exon 15 


1788 Gh^A 
1792 C !->T 


A556V 


missense 


2971 


exon 3 


702 C h->T 


Q 193 STOP 


nonsense 


3655 


exon 5 




K229STOP 


nonsense 


1010 


exon 5 


907 C h->A 


S261STOP 


nonsense 


3938 


exon 5 


932 C i->G 


Y269STOP 


nonsense 


6922 

616 

605 


exon 10 
exon 10 


R431STOP 


nonsense 


1416 C^T 


R431STOP 


nonsense 


1416 CrT 


R562STOP 


nonsense 


exon 15 


1809 C^T 






030 


exon 2 


578-579insA 


PTC + 2 aa 


shift + nonsense 


615 


exon 5 


852delll 


PTC + 18 aa 


shift + nonsense 


042 


exon 5 


882-883insA 


PTC + 12 aa 


shift + nonsense 


032 


exon 5 


906delT 


PTC + 17 aa 


shift + nonsense 


189 


exon 9 


1299delG 


PTC + 3 aa 


shift + nonsense 


3686 


exon 9 


1340del5 


PTC + 35 aa 


shift + nonsense 


625 


exon 9 




PTC + 35 aa 


shift + nonsense 


A 






1 1 ^ ~ / aa 


shift + nonsense 


115 


exon 12 


1574delGG 


PTC + 2 aa 


shift + nonsense 


3266 


exon 13 


1634del22 


PTC + 18 aa 


shift + nonsense 


149 


exon 14 


1 684-1 685insTT 


PTC + 9 aa 


oiiiit ■ iiwiiot'iiot' 


645 


exon 14 


1685deI4 


PTC + 7 aa 


shift + nonsense 






808-2 a h->e 


? 


splice site mutation 
splice site mutation 


029 
162 


intron 4 
intron 6 




? 
? 


125 


intron 7 


1223+1 gh->t 


? 


splice site mutation 


143 


intron 8 


1299+1 g!-»a 


(PTC + 6 aa) 


splice site mutation 


1620 


intron 1 1 


1538+5 gBa 


? 


loss of exon 11+ shift 


1006 


intron 1 1 


1538+3 del4 


? 


splice site mutation 


1605 


intron 13 


1661+1 gh->t 


? 


splice site mutation 


1012 


intron 13 


1662-2 a i->t 


? 


splice site mutation 


1626 


intron 15 


1812+1 gna 


A aa564 h-» aa576 (PTC+7 aa) 


splice site mutation 


2992 
5226 


intron 15 
intron 15 


1813-2 ai->g 


A aa564 i-» aa576 (PTC+7 aa) 


loss of exon 1 6 + shift 
loss of exon 16 + shift 


5330 


intron 15 


1813-2 a h^g 


A aa564 1-> aa576 (PTC+7 aa) 


loss of exon 16 + shift 


1611 


intron 16 


1813-2an>g 


? 


splice site mutation 






1853+1 g!->a 





a The nt positions refer to the sequence of the SPG4 cDNA. b The aa positions refer to the spastin sequence. 
The exon bases are indicated in upper case, those of the introns in lower case. PTC+n aa - "premature 
termination codon" at n amino acids downstream of the mutation. 



37 



Example 5 : Analysis of the protein sequence of spastin 

The open reading frame of SPG4 encodes a 616 aa protein which we have named 
spastin and the molecular weight of which is approximately 67.2 kDaltons (kD). The 
comparison of this amino acid sequence in the protein databases, using the BLAST 
5 programs, made it possible to reveal a region of strong homology with several members of 
the AAA family, at the C-terminal end of spastin. The "typical" motifs of the AAA family, 
encompassed in the AAA cassette, are located between aa positions 342 and 599 (see 
fig. 2) according to the sequence comparisons in the ProDom and Prosite protein domain 
databases. The three conserved typical domains, including the Walker A and B motifs and 
10 also the minimum consensus motif of the AAA proteins are located in the AAA cassette at 
aa positions 382-389, 437-442 and 480-498, respectively, (fig. 2). The Walker A motif, 
"GPPGNGKT", also called p-loop, which corresponds to the ATP-binding domain, and the 
B motif, "I I FIDE", are very conserved among all the members of the AAA family, including 
spastin. 

15 The comparison of the AAA cassettes present in 150 proteins of this ATPase 

family, derived from organisms which are very far apart in evolution made it possible to 
classify this set of proteins into several subgroups, as a function of the number of AAA 
cassettes identified (1 or 2) and of the sequence homologies between these various 
cassettes (Beyer, 1997). Among all the proteins of the AAA family, spastin shows stronger 

20 homology with a particular subclass of the AAAs, and more specifically with the following 
proteins, most of which were identified through the complete sequencing of the genome of 
the organism in question: two proteins of Caenorhabditis elegans, 016299 and Q18128; 
two subunits of the 26S proteasome of Saccharomyces cerevisiae, Yta6p (Q02845) and 
TBP6 (P40328) (Schnall et al., 1994); a subunit of the proteasome of 

25 Schizosaccharomyces pombe (043078); the SAP1 (P39955) and END13 (P52917) 
proteins of S. cerevisiae and the murine SKD1 protein (P46467) (Perier et al., 1994). The 
multiple alignment of these 8 proteins with spastin is represented in fig. 4A. Of the 257 
amino acids encompassing the AAA cassette (aa positions 342-599), spastin shows 52%, 
51% and 50% sequence identity with the Yta6p (Q02845) yeast protein, the 016299 

30 nematode protein and the TBP6 (P40328) yeast protein, respectively. Similar results were 
obtained by analyzing the protein sequence of spastin in the ProDom database, which 
showed the existence of three domains of homology (named 92, 179 and 6226, and 
corresponding to aa positions 342-409, 411-509 and 512-599) found in the putative 
subunits of the 26S proteasome of yeast. In addition, the members of this AAA subgroup 

35 most commonly contain motifs of the leucine-zipper type, two of which could be detected 
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in the protein sequence of spastin at aa positions 50-78 and 508-529, by analyzing the 
sequence in the Prosite database (see fig. 2). This analysis was also able to predict the 
presence of a dimerization motif of the helix-loop-helix type, located between aa positions 
478 and 486. 

5 The comparison of the protein sequence of spastin with those of the mitochondrial 

metalloproteases, such as the AFG3, RCA1 and YME1 yeast proteins, and also 
paraplegin, which is implicated in a rare form of AR-HSP, shows that the homology 
between these five members of the AAA family is limited to the 257aa region 
encompassing the AAA cassette (fig. 4B). In this region, the sequence identity between 

10 spastin and paraplegin is only 29%, whereas paraplegin and the AFG3 yeast protein are 
57% identical over this same portion of the protein sequence. This sequence comparison 
suggests that spastin does not belong to the same AAA subgroup as paraplegin and other 
mitochondrial metalloproteases. In addition, the computer analysis of the spastin 
sequence using the PSORT II program, which makes it possible to predict the subcellular 

15 location of the proteins, appears to indicate that spastin is a nuclear protein. A possible 
nuclear localization signal (NLS), RGKKK, was revealed between aa positions 7 and 11, 
whereas no signal peptide characteristic of importation into mitochondria could be 
detected, unlike what had been observed for paraplegin. 
Example 6 : Expression profiles for SPG4 and for its murine ortholoq Spq4 

20 The comparison of the nucleic acid sequence of SPG4 in the EST databanks 

made it possible to detect several human, murine and rat ESTs showing strong homology 
with SPG4. The mouse blastocyst and E8 embryo cDNA clones corresponding to two of 
the murine ESTs, AA560327 and AA107866, were obtained from the IMAGE consortium 
and entirely sequenced. The assembly of the sequences of these cDNA clones made it 

25 possible to reconstitute a 1689 bp consensus sequence including a 1514 bp incomplete 
open reading frame. The comparison between the human SPG4 cDNA and this mouse 
cDNA showed that the murine transcript lacks approximately 460 bp at the 5' end, 
including the translation initiation codon. The mouse open reading frame is followed by a 
175 bp 3' noncoding region (3' UTR) containing a polyadenylation site located -20 bp 

30 upstream of the polyA tail (fig. 5). The nucleic acid sequence of SPG4 and the protein 
sequence of human spastin show 89% (between nt positions 460 and 1982) and 96% 
(between aa positions 113 and 616) identity, respectively, with the mouse cDNA and 
deduced protein sequences. This considerable degree of homology makes it possible to 
affirm that this mouse transcript corresponds to the murine ortholog of SPG4, which was 

35 therefore named Spg4. 
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The hybridization of Northern blots comprising the mRNAs of various human and 
murine tissues (Cfontech) with the SPG4 and Spg4 cDNA clones did not give any 
convincing results, except a very weak band corresponding to a 2.5 kb transcript in the 
mouse testicle after exposure for 1 0 days. Because of the low level of expression of this 
5 gene, the expression profiles for SPG4 and Spg4 were determined by PCR experiments 
on normalized collections of cDNA originating from various adult and fetal tissues (see 
fig. 6A to 6C). The murine Spg4 gene is expressed ubiquitously in the adult tissues of 
mice, and also from the E7 stage to the E17 stage of mouse embryos (fig. 6A). Higher 
expression of Spg4 was detected in the liver, skeletal muscle and testicles, and also at the 

10 E15 stage of embryos. The early expression of Spg4 during embryonic development was 
confirmed by the presence of ESTs originating from blastocyst, E8 embryo and embryonic 
carcinoma cDNA libraries in the public EST databanks. The human SPG4 gene is, itself, 
also expressed ubiquitously in adult (fig. 6B) and fetal (fig. 6C) tissues, with perhaps more 
marked expression in fetal brain. 

15 Example 7 : No oxidative phosphorylation impairment in SPG4 locus-linked AD-HSP 

In order to determine whether spastin mutations induced an oxidative 
phosphorylation (OXPHOS) impairment in mitochondria, in the same way as had been 
observed for paraplegin, a muscle biopsy was performed on a patient from one of the 
SPG4 locus-linked AD-HSP families. The morphological and histo-enzymatic analyses of 

20 this muscle biopsy did not reveal any muscle fibers of the RRF (for "ragged red fiber") 
type, characteristic of OXPHOS impairments in mitochondria. The fact that all the muscle 
fibers appear to be normal, and also the prediction of a nuclear localization for spastin, 
seem to indicate that SPG4 locus-linked AD-HSP is not a mitochondrial disease of the 
OXPHOS type, unlike SPG7 locus-linked AR-HSP. 

25 

Using a positional cloning approach based on sequencing a 1.5 Mb region, we 
have identified the SPG4 (or SPAST) gene responsible for the most common form of 
AD-HSP, previously located on chromosomal bands 2p21-p22. Thirty nine mutations 
which modify or are likely to modify the gene product, named spastin, could be detected in 

30 the affected individuals from forty one families with AD-HSP showing a link to the SPG4 
locus. Spastin is a novel member of the AAA protein family, which appears to have a 
nuclear localization and which shows strong homology with the subunits of the 26S 
proteasome of yeast. Despite great homology restricted to a domain of 230 to 250 aa, 
termed AAA cassette, the many members of this protein family can participate in very 

35 varied cellular mechanisms, such as the transport of proteins in vesicles, cell cycle 
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regulation, organelle biogenesis, i.e. control of transcription, etc. However, all these 
cellular mechanisms involve the assembly, the functioning or the degradation of protein 
complexes, which suggest that the members of the AAA family are so-called "chaperon" 
proteins. 
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CLAIMS 

1 . Purified or isolated nucleic acid of the SPG4 gene, characterized in that it 
comprises a sequence chosen from the group comprising: 

5 a) the sequence SEQ ID No. 1, the sequence SEQ ID No. 2, the sequence SEQ ID 
No. 72, the sequence SEQ ID No. 106 or the sequence of at least 15 consecutive 
nucleotides of one of these sequences; 
b) the nucleic acid sequences which are homologs or variants of the sequences SEQ ID 
No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; and 
10 c) the complementary sequence or the RNA sequence corresponding to the sequences 
as defined in a) and b). 

2. Purified or isolated nucleic acid according to claim 1, with the exception of 
the nucleic acid identified in the GenBank databank under the accession number 
AB029006. 

15 3. Purified or isolated nucleic acid according to claim 1 or 2, characterized in 

that it comprises at least one sequence of at least 15 consecutive nucleotides of the 
nt 714-809, ends inclusive, fragment of the sequence SEQ ID No. 2, of the sequence 
complementary thereto or of the sequence of the corresponding RNA thereof. 

4. Purified or isolated nucleic acid according to one of claims 1 to 3, 
20 characterized in that it comprises a mutation corresponding to a natural polymorphism in 

humans. 

5. Probe or primer, characterized in that it comprises a sequence of a nucleic 
acid according to one of claims 1 to 4. 

6. Probe or primer according to claim 5, characterized in that its sequence is 
25 chosen from the sequencs SEQ ID No. 4 to SEQ ID No. 71 . 

7. Splice acceptor or donor site, characterized in that it comprises a sequence 
of a nucleic acid according to claim 1 chosen from the sequences SEQ ID No. 74 to SEQ 
ID No. 105. 

8. Method for screening cDNA or genomic DNA libraries, or for cloning 
30 isolated genomic or cDNA encoding spastin, characterized in that it uses a nucleic acid 

sequence according to one of claims 1 to 7. 

9. Method according to claim 8, for identifying the genomic or cDNA sequence 
of the SPG4 gene of mammals, in particular of mice. 

10. Method for identifying a mutation carried by the human SPG4 gene, 
35 characterized in that it uses a nucleic acid sequence according to one of claims 1 to 7. 
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11. Method according to claim 10, for identifying a mutation responsible for 
autosomal dominant hereditary spastic paraplegia. 

12. Method for identifying the nucleic acid sequences which promote and/or 
regulate the expression of the SPG4 gene, characterized in that it uses a nucleic acid 

5 sequence according to one of claims 1 to 7. 

13. Nucleic acid identified using a method according to one of claims 9 to 12. 

14. Polypeptide encoded by a nucleic acid according to one of claims 1 to 4 

and 13. 

15. Polypeptide according to claim 14, preferably with the exception of the 584 
10 amino acid peptide, the sequence of which is identified in the GenBank databank under 

the accession number AB029006. 

16. Polypeptide according to claim 14 or 15, characterized in that it comprises 
an amino acid sequence chosen from the group comprising: 

a) the sequence SEQ ID No. 3, the sequence SEQ ID No. 73, the sequence SEQ ID 
15 No. 107 or the sequence of at least 10 consecutive amino acids of one of these 

sequences; and 

b) the sequences which are homologs or variants of the sequences SEQ ID No. 3, SEQ 
ID No. 73 or SEQ ID No. 107. 

17. Polypeptide according to claim 14 or 15, characterized in that it comprises 
20 the sequence of at least 8 consecutive amino acids of the sequence of the aa 197-228, 

ends inclusive, fragment of the sequence SEQ ID No. 3. 

18. Polypeptide according to claim 14 or 15, characterized in that it comprises 
an amino acid sequence chosen from the group comprising the sequence SEQ ID No. 3, 
the sequence SEQ ID No. 73, the sequence SEQ ID No. 107, which sequences carrying 

25 at least one of the mutations corresponding to a natural polymorphism in humans, and the 
sequences of the fragments thereof of at least 10 consecutive amino acids. 

19. Cloning and/or expression vector containing a nucleic acid sequence 
according to one of claims 1 to 4, and 13. 

20. Vector according to claim 19, characterized in that it includes the elements 
30 required for its expression in a host cell. 

21 . Host cell transformed with a vector according to claim 19 or 20. 

22. Mammal, except a human, characterized in that it comprises a cell 
according to claim 21 . 

23. Mammal, except a human, according to claim 22, comprising a transformed 
35 cell, characterized in that the sequence of at least one of the two alleles of the SPG4 gene 
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contains at least one of the mutations corresponding to a natural polymorphism in humans 
or identified using a method according to claim 10 or 11. 

24. Use of a nucleic acid sequence according to one of claims 5, 6 and 13, as a 
probe or primer, for detecting and/or amplifying nucleic acid sequences. 
5 25. Use of a nucleic acid sequence according to one of claims 1 to 7, and 13, 

for screening a genomic or cDNA library. 

26. Use of a nucleic acid sequence according to one of claims 1 to 4 and 13, 
for producing a recombinant or synthetic polypeptide. 

27. Method for producing a recombinant polypeptide, characterized in that a 
10 transformed cell according to claim 21 is cultured under conditions which allow the 

expression of said recombinant polypeptide, and in that said recombinant polypeptide is 
recovered. 

28. Polypeptide, characterized in that it is obtained using a method according 
to claim 27. 

15 29. Mono- or polyclonal antibodies or their fragments, chimeric antibodies or 

immunoconjugates, characterized in that they are capable of specifically recognizing a 

polypeptide according to one of claims 14 to 18, and 28. 

30. Method for detecting and/or purifying a polypeptide according to one of 

claims 14 to 18, and 28, characterized in that it uses an antibody according to claim 29. 
20 31. Method for genotypic diagnosis of AD-HSP associated with the SPG4 

gene, characterized in that a nucleic acid sequence according to one of claims 1 to 7 and 

13 is used. 

32. Method for genotypic diagnosis of AD-HSP associated with the presence of 
at least one mutation on a sequence of the SPG4 gene, using a biological sample from a 

25 patient, characterized in that it includes the following steps: 

a) where appropriate, isolation of the genomic DNA from the biological sample to be 
analyzed, or production of cDNAfrom the RNA of the biological sample; 

b) specific amplification of said DNA sequence of the SPG4 gene likely to contain a 
mutation, using primers according to either of claims 5 and 6 or a nucleic acid 

30 according to claim 13; 

c) analysis of the amplification products obtained and comparison of their sequence with 
the corresponding normal sequence of the SPG4 gene. 

33. Method for diagnosing AD-HSP associated with abnormal expression of a 
polypeptide encoded by the SPG4 gene, characterized in that one or more antibodies 

35 according to claim 29 is (are) brought into contact with the biological material to be tested, 
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under conditions which allow the possible formation of specific immunological complexes 
between said polypeptide and said antibody or antibodies, and in that the immunological 
complexes possibly formed are detected and/or quantified. 

34. Method for selecting a chemical or biochemical compound which is capable 
5 of interacting directly or indirectly with a polypeptide according to one of claims 14 to 18, 

and 28, or with a nucleic acid according to one of claims 1 to 7, and 1 3, and/or which 
makes it possible to modulate the expression or the activity of these polypeptides, 
characterized in that it comprises bringing a nucleic acid sequence according to one of 
claims 1 to 7, and 13, a polypeptide according to one of claims 14 to 18, and 28, a vector 
10 according to either of claims 19 and 20, a cell according to claim 21, a mammal according 
to either of claims 22 and 23 or an antibody according to claim 29 into contact with a 
candidate compound, and detecting a modification of the activity of said polypeptide. 

35. Use of a nucleic acid sequence according to one of claims 1 to 7, and 13, 
of a polypeptide according to one of claims 14 to 18, and 28, of a vector according to 

15 either of claims 19 and 20, of a cell according to claim 21, of a mammal according to 
either of claims 22 and 23 or of an antibody according to claim 29, for studying the 
expression or the activity of the SPG4 gene. 

36. Kit or pack for diagnosis, characterized in that it comprises at least one 
compound chosen from the following group of compounds: 

20 a) a nucleic acid according to either of claims 5 and 6; and 
b) an antibody according to claim 29. 
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CLONING, EXPRESSION AND CHARACTERIZATION OF THE SPG4 GENE 
RESPONSIBLE FOR THE MOST COMMON FORM OF AUTOSOMAL DOMINANT 
SPASTIC PARAPLEGIA 

ABSTRACT OF THE DISCLOSURE 

The invention concerns the identification and characterization of the 
SPG4 gene encoding spastin, and some mutations thereof responsible for the most 
frequent form of autosomal dominant familial spastic paraplegia, to the cloning and 
characterization of its cDNA and the corresponding polypeptides. The invention also 
concerns vectors, transformed cells and transgenic animals as well as diagnostic 
methods and kits, and methods for selecting a chemical or biological compound 
capable of directly or indirectly interacting with said polypeptide. 
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jGCTCCTGAGACCGGCGGGCACACGGGGGTCTGTGGCCCCCGCCGTAGCAGTGGCTGCCGCCGTCGCTTGGTTCCCGTCGGTCTGCGGGAGGCGGG 9 5 

1 TTATGGCGGCGGCGGCAGTGAGAGCTGTGAATGAATTCTCCGGGTGGACGAG(KAAGAAGAAAGGCTCCGGCGGCGCCAGCAACCCGGTGCCTCC 19 0 
1 MNSPGGRGKKKGSGGASNPVPP 

CAGGCCTCCGCCCCCTTGCCTGGCCCCCGrc^ 28 5 
23 RPPPPCLAPAPPAAGPAPPPESPHKRNLYYFS 

CCTACCCGCTGTTTGTAGGCTTCGCGCTGCTGCGTTT^ 380 
55 YPLFVGFALLRLVAFHLGLLFVWLCQRFSRA 

CTCATGGCAGCCAAGAGGAGCTCCGGGGCCGCGCCAGCACCTGCCTCGGCCTCGGCCCCGGCGCCGGTGCCGGGCGGCGAGGCCGAGCGCGTCCG 475 

86 lmaakrssgaapapasasapa p v pggeaervr 

agtcttccacaaacaggccttcgagtacatctccattgccctgcgcatcgatgaggatgagaaagpaggacagaaggagcaagctgtggaatggt 57 0 
118 vfhkqafeyisialrided eka2gqkeqavewy 

ataagaaaggtattgaagaactggaaaaaggaatagctgttatagttacaggacaagjgtgaacagtgtgaaagagctagacgccttcaagctaaa 665 
150 kkgieelekgiavivtgqg3eqcerarri.qak 

atgatgactaatttggttatggcc^ggaccgcttacaacttcra^ 760 
181 mmthlvmakdrlqli.e4kmqpvi.pfsksqtdvy 

taatgacagtactaacttggcatgccgcaatggacatctcc^gt^ 855 
213 hdstnlacrnghlqsessgavpkrkdpi.thtsn 

attcactgcctcgttcaaaaacagttatgaaaactggatc 950 
245 slprsktvmktgsag lsghhrapsysglsmv 

tctggagtgaaacagggatctggtcctgctcctacc^ctc^^ 1045 
276 sgvkqgsgpaptthkg6tpktnrtnkpstptta 

tactcgtaagaaaaaagacttgaagaattttaggaatgtggacagcaaccttgctaaccttataatgaatgaaattgtggacaj^tggaacagctg 1140 

308 TRKKKDLKNFRNVDSNLANLIMNEIVDNG7TAV 

TTAAATTTGATGATATAGCTGGTCAAGACTTGGCAAAAC^G^ 1235 

340 KFDDIAGQDLAKQALQEIVILPSLRPEL8FTG 

CTTAGAGCTCCTGCCAGAGGGCTGTTACTCT^TGGTCCACCTGGGAATGGGAAGACAATGCTG|3CTAAAGCAGTAGCTGCAGAATCGAATGCAAC 1330 

371LRAPARGLLLF GPP GHGKT MLA9KAVAAESNAT 

CTTCTTTAATATAAGTGCTGCAAGTTTAACTTCAAAATACfcTGGG&GAAGGA 1425 

403 FFNISAASLTSKY YlOG EGEKLVRALFAVARELQ 

AACCrrCTATAATTTTTATAG^TGAAGTTGATAGCCTTTTGTGTGAAAGAAGAGAAGGGGAGCACGATGCTAGTAGACGCCTAAAAAC^ 1520 

435 p s I I F I DUE VDSLLCERREGEHDASRRLKTEF 

CTAATAGAATTTGATCGTpTACAGTCTGCTG^ 1615 

466 L I E F D G V12Q S A G D D R VL VMGATNRPQELDEAVL 

CAGb<OTTTCATCAAACGGGTATATGTGTCTT^ 17 10 

498 JUS FIKRVYVSLPNEE T14R LLLLKNLLCKQGSPL 

TX3ACCCAAAAAGAACTAGCACAACTTGCTA(^UVIGACTGATGGATACTC^ 1805 

530 TQKELAQLAR MIST DGYSGSDLTALAKDAALGP 

ATCCGAGksCTAAAACCAGAACaGGTCAAGAArArc^^ 1900 

561 I R E16L KPEQVKNMSASEM17RNIRLSDFTESLKKI 

AAAACGCAGCGTCAGCCCTCAAACTTTAGAAGCGTACATACGTTGGAACAAGGACTTTGGAGATACCACTGTTTAAGGAAATACCTTTGTAAACC 1995 

593 KRSVS PQTLEAY IRWNKDFGDTTV * 

TGCAGAACATTTTACTTAAAAGAGGAAACACAAGATCTTC^TG^ 2090 
TACATATTTGTGCACCAAACTTGAAGATGAACCAGAAAACAGACTTAA&CAAAATATACAATC 2185 
CTTGATGGTCACAGTTATCCC^TGGACACTAAGTTAGAGCAC^ 2280 
AATT1X3TATAT1^TGTTGCAGATGAAAGTATTCCAGGAACAGTGAATGGTAGAAGACACAAGAAC^TTTGTTTGTTTGTCTTCTGATGTTTTTTC 2375 
TTAAAATAGTAATTTCTCCTACTTTTCTTTTCTACTGTTGTCTTAACTACAGGTGATTGGAATGCCA 2470 
GTTTTATAAATTCAGTGTGCCAAATGAAACTTTTTTCCTAAGTAACTGTi^TAGGAAAAAGTTTATTTTGAGAGTTTCTTCTTCATAAATCTACA 2565 
GACATTAAACAATTGTTGTGTTCTTTTTACCTTTTAT^ 2660 
TATGGTAAAATAGAGAAGGTTTGAAGGTTTGAGTTACTCTGTCATATA^^ 2755 
GTATTTGTCAGAAATCTGTTGTAGACTGTTAACTTCT^ 2850 
ACTGCTGTGAAAATGTTTCCAGTGCAAGAGAAGGGAAATACTAGGAACTAAGACATTTCTAATTTATTGCTTATTACTTTCTTAATTTTACAGGA 2945 

TAATTATAAGCAAGTGGAACTACCATCTTTTATTCTTAATAATTATTAATCCCTTCAATGAAACTTTAA 3040 
ACATTTTTCTAGTTCCTTCTGCTTGCTTTATT^ 313 5 

«rr , r , a l r t r , rrzT"i ,t iYrr& t r& R &TaTGT'rTGGaTTT^r.RTTATAAAAATGTCA < rTGTAGGGAGTAGAGACTCATATCATGGCC'rTTTAAATATTGT AflTA 3230 
AAGGCAAATAGATATTTGCCCTTAGTTTACTGG 3263 
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TBP6_YEAST 

Q02845_YEAS 

SAPI_YEAST 



V.7. 



PAST_HUMAN 
4 307B_SCHPO 



KSRIOSQTCICHMTCQST 



-EKTYTLKKGLGNLWDHPELRSIIED- 



SS VLGLG TGNGLRFPK 



EN13_YEAST 



O43078_SCHPO 
016299_CAEEL 
Q18128_CAEEL 



SKSVPVTIKVSTXLCAPACQQSCSQQCQDNQTI 



SQCQAKCQAECGIQNGMGFQQSPATTTDAPIVIRLEITSGS 



SKDl_MOUSE 

spastIhouse 

016299ZCAEE 
«18128_CAEE 



-KLCQRFSRALM 



I LLCEGNRDWAGAYVSYCKVLEEMK"KSSAAHDRMGLGPLTGAEACSWNGLYDNCLSKASKLRKTILESEMERO.NYQLAAKLSKKAPVDLHPLRPVR 

, MYSIKRVKLHVTSGMRKRPETGENNDDLYPPT 

. SHSQCAPKCIQQCKNOCASQHQKTHQCAKECHHQCAHSCSPSSQTSSYSKCVSQCAEQCSGSKTSKHQQCQQQCQSNTCGQYQSTVSTT1TTPII 



SQTPAYTPMT-- 



AESVRVFHKQAPEYIS 

CRMKYRQTRGAQSEVNLSTPKQIYSKHSrPSTSTSSIVSSSYG- 
riGKPRRKIVVETFSDSAQQQPPFKSRSQQNGLDDELDGIIID-- 
iECEEQCEHSCONQCQAQSQSSiOQCAQACOTGCVAQFPMFAFS- 



ialridedekagqkeqavewykkg i 

ialrideeekagqkeqavewykkgieelekg-- 

dapsylapskpnrsp plkpedpfas fns s as ai aaa 
;dedrtvdvsfsqkqdtrklksrpf 

-KGPAGS STYDRVAQKFQDG YEKMRAAIE 



TBP 6_YE AST M.A&QLAWS QYQNG AH HSK VSNNG SHSHELQI R YKP TPELKKEYDYKK PTVH RPI 

Q0 2 84 5_Y£AST MAAQLAWS QYQHGAM HSK VSNNG SHSHELQI R YKP TPPLKKRYDYKK PTVH RPI 

SAP1_YERST QISKLHKHHBVPYLKGTKSTPTLITKSTPTFITESKSHTKPIIKSNASSPTSSLTVPNSVIBKPKTAAMAAKRVLHSKKVASNPALHTIKKSHPI 

SKD1 MOOSE MASTNTNLQKA IDLASKAAQEBK AGN 1 

EN13_YEAST HSXGBFLTKG IELVQKA IDLDT ATQ 1 

SPAST HUMAN - IAVI VTGQGEQCERARR LQAK MMTHLVMAK.DRLQLLEK — KQP — VLPTSKSQTDVYN DSTN -LACRNGB 

SPASt'mOUSE -IAVIVTGQGEQYEBARE LQAK MHTNLVMAKDRLQLLEK — LQP - - VLQFSKSQTDVYN ESTN-LTCRHGH 

04 3 0 7?_SCHPO SKSAAASASBLSSDTGRS ATMHSTTFPTAHKSO.STTKPTLSHS VSSPSIQVSNHQN ANN-STPLSFE 

016 299_CAEEL P KEKEEKF.REEPFTMRG FDFG SDDKVTKIRDKICDIVDPTNARRTD P NFIRQMHEHTLKGI EVASNPHF KKTRA 

Q18 12 8_CAEEL ACSIQEKLRTAELYKEAKSLLKE ANEFNIKD IP ETRRSEIRDKRQNMMKLEKS AQDX - -LIAICNEVDENVK QSRSATVGPSR 

TBP 6_Y EAST IKSPTLNRQNSKSSRHIPTNSKLKASKSHTKKVSRRHEO.HLEPS--SPVI.VSAT AVPAESK PMRSKSGTPDKESSASSSLDSRKE 

Q0 2 8 45_YEAST IKSPTLHRQNSKSS RHI PTNSKLKASKSNTNKVSBRHEQNLEPS - - SPVLVSAT AVPAESK FM RSKSGTEDKESSASSSLDSRKE 

SAP1_YEAST LKSKTAKVPNSSSKK TS SHPSRPVSHSKPY SHGASONKKPSKNQTTSMSKTHR-KIPAQKKIGSPKIEDVGTED ATEHATSLNEQREEEE ID 

SKDl_MOUSE E-EALQLYQHAVQYFLHVVKYEAQGDKAXQSIRAKCTEYLDHAEKLKEYLKKKE KKPQKP VKEAQSGPVDEKGNDSDGEAESDD 

EN13_YEAST E - EA YTAYYNGLDYLKLALKYE-KNP KSKDLIRAKFTEYLNRAEQLKKHLESEE ANAAKK SF S AGSG - - SNGGNKKISQEEGEDN 

SPAStImODSE I.QSESGAVPKRKDPLTKASHSLPKSKTVLKSGSAGLSGHHRAPSCSGLSMVSGASPGPGPAATT--HKGTEKPNRTNKPSTETTAVEKKKD 

O4307 8_SCHPO AP IPPLHVP AVPLTS ASHSSSDGKS RKHPSPYKPXLNSSHDTLG — SSTRPSSADTAGSPATSP FATADSKTIVSKTIS ASTTQQTEP 

016 29 9_CAEEL PTKNRAAIQHTLGTLYESFTT AAGQDPQNSXFQVFLDRQSSSQSIGSLAG IPPAXR- APDIPKRCSNPLIRKAMGMDTEGGGKDEKMSGL R 

Q18128_CAEEL P ASAASVTPRPTRATAPEKKNAAKAKENDENREVCSRGDRCGAHHOPVTKKSDTVHPEPPVQAS NRKHETVKRVKVDKASLPHHQBEVKRA 

:SVQG VDHNACEQILNE 
■ -DILKSVQGVDRHACEQILNEII.VTDEKVYW: 
KKVLREILEDEIIDSLQGVDRQAAKOIFAEIVVHGD 

pekkklqnqlqgaivi 

-lkhfhnvdsnlanlihheivdngtavkfdI 
-lknfrnvdsnlanlimheivdngtavkfd 

O43 07 8_SCHPO 1 r LQQTTPSSDF EYAIWNEI I SNHEPVYWSj 

016299_CAEEL AEPTLKHFDENI 

01812 8_CAEEL ALLNGVDKVIGEELLDEVIDH-TGVRH1 




:CSVDHKKERVVRSI.QKI- 
:G--IEIK1)FQNALI.TIKKSVSSESLQKYE 

JDLGDKLLETER EHIRP — XGIiVDFKNSLVYIKPSVSQDGLVKYE 

BKVOSATHFKKVRGPSRADPNCIVNDLLTPCSPGDPGAIEMTWMDVPGDKLLEPVVSHWDMLSSLSSTKPTVNEQDLI.KLK 

jlKIQSATHFKDV STEDDE TRKI.TPCSPGDDGAIEHSWTDI EADELKEPDLTIKDFLKAIKSTRPTVNEDDLLKQE 

HELKPEQVKNMS ASEMRN- - IKLSPFTESLKKUCRSVSPQTLEA1I 

IPEQVKNMS ASEMHN- -IELSDFTESLKKIKRSVSPQTLEAYI 

P- -ISLNHFKASLETIRESVSQEGIHRYE 

|D IGDDIETIDK DDIRA- - VTVMDFAE AAHVVRPTVDDSQLDAYA 

-GEKIRK- -rRASDFDTALRTIRPSTSQKIKSKLS 
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MMMWQRYARGAPR SLTSLS FGKASRISTVKP VLRSRKP VHQRLQTLSGkAT R NTI 

-MLLLSWSET ATKVVERPVRFRS Y YGLXHIKSLHTQYELLNRLQENKS GNKNEDNNEDAKLNKEI PTDEEVEAIRKQVEKYIEQTKNNTI 

MAVLLLLLRALRRGPGPGPR P1WGPG PAWS - PGFPARPGRGRP YMASEPPGD — LAEAGGEALQ 

HNVSKILVSPTVTTlf VLRI FAPRLPQIGASLLVQKKWA LESKKF YEP Y S ETCNSGEMPP 

- -HNSPGGEGKKKGSGGASH PVPPE 



YME1_YEAST 



EHEPNKE GEGKNNGHKDHHSNKEDGKDKRNEFGSLS- 

ECA1_ YEAST EQKRKIDE S IRRLE DAVLKQ ESNRI QEERKEKF.EENG PSKAKSNRTKEQG YFEGNNSRNI PP PPP P 
PAR APLEG I N LQLRI,LTPTF£GINGLLLKQH--LVQNPVRLWQLLGGTFYFNTSRLKQKNKEKDKSKGKAPEEDE- 

YHE1_YEAST KEADSSGKASNKST I SSIDHSQP - - PPPSSTBDKTKQ ANVAVS HAMLATREQEA - 

SPAST HUMAN QRF SRALMA AKRSSGAAP AP AS ASAP APVP - GGEAERVRVFHKQ AFE YI S I ALRI DEDEKAG 
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LNDPSNPVSKNVNLFQIGLTFFL 

- EERR R REEDDQM YREKLRTLLV 
-NKDLTSPDAQAAFYKLLLQSNY 

- QKEQ AVEWYKKG I EELEKG I AV 
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IAVVMSLLNALSTSGGSISWNDFVHEMLAKGEVQRVQVHPESDVVEVYLHPGAVVFGRPRL&LMYRMQVANI T> 

PQYVVSRFETPGIASSPECMELYMEALQBIGRHSEADAgRQN-LLTASSAG AVHPSLASSSSKQ SGYHGN FPSMYSPLYG 
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PIEELKPEQVKN--MSgSEHRHIELSDFTESgKKIKR- -SVSPQTLe| 



SIIPEGQGALGYAQYLPPDQYLISEEQFRHRHIMALGGSVSEELHFE- -SVTSGAHDDFKKVTQHAKAMVTSI.GHSPKIGyLSFDQNDG- - NFKV 
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AFG3_YEAST NKPFSMXTARTIDLEVKSIVDDAHRACTELLXKNLDKVDLVAKELLRKE A I TSEDM IRLLGPRP - FKERNEAF EKYLD PKSNTEP 
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Human: 1 (...] 459 

Mouse: I HGGCCGAGhGCGTCCGCGTCTTCCACAAGCAGGCCTTCGAGTACATCTCCATTGCCCTGC 60 

mum iiiiiii milium immmimmmmmimi 

Human: 460 AGGCCGAGCGCGTCCGAGTCTICCACAAACAGGCCTTCGAGTACATCTCCATTGCCCTGC 519 
Mouse: 61 GCATCGACGAGGAAGAGAAAGCAGGACAGAAGGAACAAGCTGTGGAATGGTATAAGAAAG 120 

Mini! urn imimiinniimi mmimmiii Minim 

Human: 520 GCATCGATGAGGATGAGAAAGCAGGACAGAAGGAGCAAGCTGTGGAATGGTATAAGAAAG 579 
Mouse: 121 GTATCGAAGAACTGGAAAAAGGA&TCGCTGTTATAGTTACGGGCCAAGGTGAACAGTATG 180 

mi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 f 1 1 miiimmii 11 iimiimm u 

Human: 580 GTATTGAAGAACTGGAAAAAGGAATAGCTGTTATAGTTACAGGACAAGGTGAACAGTGTG 639 
Mouse: 181 AAAGAGCTAGACGTCTTCAAGCCAAAATGATGACTAATTTAGTTATGGCCAAGGACCGTT 240 

iiiiiiitiiui iiiiuii iiiiiiiiiiimiii imiiiiimiim i 

Human: 640 AAAGAGCTAGACGCCTTCAAGCTAAAATGATGACTAATTTGGTTATGGCCAAGGACCGCT 699 
Mouse: 241 TACAACTTCTAGAGAAGCTGCAACCAGTITTGCAA1TTTCCAAGTCACAGACGGACGTCT 300 

mmmiimm immmimi mimmmii iimiim 

Human: 700 TACAACTTCTAGAGAAGATGCAACCAGTrtTGCCATTTTCCAAGTCACAAACGGACGTCT 759 
Mouse: 301 ATAACGAGAGTACTAACCTGACATGCCGCAATGGACATCTCCAGTCAGAAAGTGGAGCAG 360 

mi u iiiiiini n miiHiiMtiiiiiiimiimiiniiim i 

Human.: 760 ATAATGACAGTACTAACTTGGCATGCCGCAATGGACATCTCCAGTCAGAAAGTGGAGCTG 819 
Mouse: 361 TTCCGAAGAGGAAAGACCCCTTAACACATGCTAGTAATTCMTGCCTCGATCAAAAACTG 420 

mi it ii nimimmiiii immim mini iiiiuii i 

Human: 820 TTCCAAAAAGAAAAGACCCCTTAACACACACTAGTAATTCACTGCCTCGTTCAAAAACAG 879 
Mouse: 421 TCCTGAAAAGTGGCTCCGCAGGGCTCTCCGGTCACCACAGGGCGCCTAGTTGCAGTGGTT 480 

i Mini in u mil it n ii inn n h iiiiiii mum 

Human.: 880 TTATGAAAACTGGATCTGCAGGCCTTTCAGGCCACCATAGAGCACCTAGTTACAGTGGTT 939 
Mouse: 481 TGTCCATGGTTTCTGGAGCAAGACCGGGACCTGGTCCTGCAGCTACCACACATAAGGGTA 540 

I 1 1 1 1 1 1 ( 1 1 1 1 1 T 1 1 1 I II llll llllllllll IIIIIII 1 1 1 1 1 1 i 1 1 1 

Human: 940 TATCCATGGTTTCTGGAGTGAAACAGGGATCTGGTCXJTGCTCCTACCACTCATAAGGGTA 999 
Mouse: 541 CTCCAAAACCAAATAGAACCAACAAACCTTCTACTCCCACAACTGCAGTTCGGAAAAAGA 600 

mi ill imm ii ii minimi u mum ui u u i 

Human: 1000 CTCCGAAAACAAATAGGACAAATAAACCTTCTACCCCTACAACTGCTACTCGTAAGAAAA 1059 
Mouse: 601 AAGACTTGAAAAATTTTAGGAATGTGGACAGCAATCTTGCTAACCTrATAATGAATGAAA 660 

mnmii uiimmuimiumi uiiiiuiiimi in uui u 

Human: 1060 AAGACTTGAAGAATTTTAGGAATGTGGACAGCAACCTTGCTAACCTTATAATGAATGAAA 1119 
Mouse: 661 TTGTTGACAATGGGACAGCTGTTAAGTTTGATGACATAGCCGGGCAGGAGCTGGCAAAGC 720 

mi iiiiuii uiiimiii mum mil u u u mini i 

Human: 1120 TTGTGGACAA1GGAACAGCTGTTAAATTTGATGATATAGCTGGTCAAGACTTGGCAAAAC 1179 
Mouse: 721 AAGCGCTGCAGGAGATTGTCATCCTTCCTTCTCTGCGGCCTGAGTTGTTCACAGGGCTCA 780 

mi mi ii urn u iiiiiiiinii i mini iiiiiiiimm i 

Human: 1180 AAGCATTGCAAGAAATTGTTATTCTTCCTTCTCIGAGGCCTGAGTTGTTCACAGGGCTTA 1239 
Mouse: 781 GAGCTCCTGCTAGAGGCTTGTTACTCTTCGGTCCGCCAGGAAACGGAAAAACAATGCTGG 84 0 

imiimi urn iimiim urn u u u u u iiiiiiiui 

Human: 1240 GAGCTCCTGCCAGAGGGCTGTTACTCTTTGGTCCACCTGGGAATGGGAAGACAATGCTGG 1299 
Mouse: 841 CTAAAGCAGTAGCTGCAGAGTCTAATGCGACCTTTTTCAACATAAGTGCTGCCAGTTTAA 900 

iiiiiiiiiiiiiiiiiii ii mil urn u it minimi iiiiiii 

Human: 1300 CTAAAGCAGTAGCTGCAGAATCGAATGCAACCTTCTTTAATATAAGTGCTGCAAGTTTAA 1359 
Mouse: 901 CTTCAAAATATGTGGGAGAAGGAGAGAAATTGGTGAGAGCICTCITTGCTGTGGCTCGAG 960 

iiiiiiiiii mimummimmim inn iiiiiiiiiiiiini 

Human: 1360 CTTCAAAATACGTGGGAGAAGGAGAGAAATTGGTGAGGGCTCTTTTTGCTGTGGCTCGAG 1419 
Mouse: 961 AACTTCAACCATCTATAATTTTTATAGATGAAGTTGACAGTCTTTTGTGTGAGAGACGGG 1020 

iiiiiiiiii mimummimmim u iiiiiiiini m t i 

Human: 1420 AACTTCAACCTTCTATAATTTTTATAGATGAAGTTGATAGCCTTTTGTGTGAAAGAAGAG 1479 
Mouse: 1021 AAGGGGAGCACGACGCTAGCAGACGGCTAAAGACGGAATTTTTAATAGAATTTGACGGGG 1080 

iimiiiiiiu urn mu urn u mm mumiuii u i 

Human: 1480 AAGGGGAGCACGATGCTAGTAGACGCCTAAAAACTGAA'ITTCTAATAGAAIITGAIGGTG 1539 
Mouse: 1081 TGCAATCTGCTGGAGATGACAGAGTACTTGTAATGGGTGCAACTAACAGGCCCCAAGAGC 1140 

i ii iiumuimmmimiiimumimm mu iiiiiii 

Human: 1540 TACAGTCTGCTGGAGATGACAGAGtACTTGTAATGGGTGCAACTAATAGGCCACAAGAGC 1599 
Mouse: 1141 TTGATGAAGCTGTTCTCAGGCGTrrCATTAAACGGGTATATGTGTCCTTACCAAATGAGG 1200 

iiiiiii imiiiuiiiiiiiiiii iiiiiiiiiiiiiiiii uuimmii 

Human: 1600 TTGATGAGGCTGTTCTCAGGCGTTTCATCAAACGGGTATATGtGTCTTTACCAAATGAGG 1659 
Mouse: 1201 AGACAAGACTCCTTCTGCTTAAAAACCTGTTGTGTAAACAAGGAAGTCCACTGACCCAAA 1260 

iiiiiiiiii iii iiiiiiiiii mu mmuuumm mmui 

Human: 1660 AGACAAGACTACTTTTGCTTAAAAATCTGTTATGTAAACAAGGAAGTCCATTGACCCAAA 1719 
Mouse: 1261 AAGAACTCGCACAGCTTGCTAGAATGACCGATGGATACTCTGGAAGTGATCTGACCGCTT 1320 

u mu urn 1 1 mu mi in i iiiiiiiiii mum u u mi 

Human: 1720 AAGAACTAGCACAACTTGCTAGAATGACTGATGGATACTCAGGAAGTGACCTAACAGCTT 1779 
Mouse: 1321 TGGCCAAGGATGCAGCCCTGGGTCCTATCCGAGAACTGAAGCCAGAGCAGGTGAAGAATA 1380 

mi u iiiiuii m i i i i i i i i i i i i i i i n ii mu iiiiumuii 

Human: 1780 TGGCAAAAGATGCAGCACTGGGTCCTATCCGAGAACTAAAACCAGAACAGGTGAAGAATA 1839 
Mouse: 1381 TGTCTGCCAGTGAGATGAGAAATATTCGATTATCTGACTTCACAGAATCCTTAAAAAAGA 1440 

iimummiiummiiiiiiiimiiiimii mum mu i 

Human: 1840 TGTCTGCCAGTGAGATGAGAAATATTCGATTATCTGACTTCACTGAATCCTTGAAAAAAA 1899 
Mouse: 1441 TAAAACGCAGTG5GAGTCCTCAGACCTTAGAAGCATACATACGCTGGAACAAGGATTTTG 1500 

iiimiiu u u mu ii iiiiuii mum minimi mi 

Human: 1900 TAAAACGCRGCGTCAGCCCTCAAACTTTAGAAGCGTACATACGTTGGAACAAGGACTTTG 1959 
Mouse: 1501 GAGACACCACTGTTTAAAGGAAT 1S23 

mi mmmm i m 

Human: 1960 GAGATACCACTGTTTAAGGAAAT 1982 
Human: 1983 C-. -1 3263 

Mouse: 1524 GGATGCCTCTGTGAGCCCATAGAACATCGCACTTCACAGGAAACAAGAGCTTTGGCTACA 1583 
1584 GGAACCCAGACTTCGTTTACAGGACGTTTTAGAGTTTTCATTTTTCTGCACCAAACTTGA 1643 
1644 AGAGGAACAAGAAGACAGACCTAAflTAaAATATGCAATATGAATGG 1689 
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5 The invention relates to the identification and characterization of the SPG4 gene 

encoding spastin, which is responsible for the most common form of autosomal 
dominant hereditary spastic paraplegia (HSP), to the cloning and characterization of its 
cDNA, and also to the corresponding polypeptides. The invention also relates to 
vectors, to transformed cells and to transgenic animals, and also to diagnostic methods 

10 and kits and to methods for selecting a chemical or biochemical compound capable of 
interacting directly or indirectly with a polypeptide according to the invention. 

Hereditary spastic paraplegias (HSPs) are degenerative disorders of the central 
nervous system, characterized by bilateral and progessive spasticity of the lower limbs. 
They reveal themselves clinically through difficulties in walking possibly evolving into 

15 total paralysis of both legs. The physiopathology of this set of diseases is, to date, 
relatively undocumented; however, anatomopathological data make it possible to 
conclude that the attack is limited to the pyramidal tracts responsible for voluntary 
motricity in the spinal cord (Reid, 1997). Various clinical and genetic forms of HSP 
exist. The so-called "pure" HSPs, which correspond to isolated spasticity of the lower 

20 limbs, are clinically distinguished from the "complex" HSPs, for which the spasticity of 
the legs is associated with other clinical signs of neurological or non-neurological type 
(Bruyn et al., 1991). From a genetic point of view, the HSPs can be transmitted 
according to the autosomal dominant (AD-HSP), autosomal recessive (AR-HSP) or X- 
linked (X-HSP) mode. The "pure" form of HSP, which is most commonly transmitted 

25 according to the autosomal dominant mode, remains the most frequent (approximately 
80% of HSPs) (Reid, 1997). The incidence of HSPs, which remains difficult to estimate 
because of rare epidemiological studies and the considerable clinical variability, varies 
from 0.9 : 100 000 in Denmark, 3 to 9.6 : 100 000 in certain regions of Spain (Polo et 
al., 1991) or 14 : 100 000 in Norway (Skre, 1974) (approximately 3 : 100 000 in 

30 France). 

In addition to this great clinical variability, which is observed not only between 
various families but also between various affected members of the same family, the 
HSPs are also characterized by considerable genetic heterogeneity. In the case of 
AD-HSPs, four loci have been identified, to date, on chromosomes 14 (locus SPG3) 
35 (Hazan et al., 1993), 2 (locus SPG4) (Hazan et al., 1994; Hentali et al., 1994), 15 
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(locus SPG6) (Fink et al., 1995) and 8 (locus SPG8) (Hedera et al., 1999). The study of 
a large number of families exhibiting an AD-HSP has shown that the gene carried by 
chromosome 2 is a main locus of this form of the disease, found in 40 to 50% of the 
families analyzed (The Hereditary Spastic Paraplegia Working Group, 1996; Durr et al., 
1996). An anticipation phenomenon was observed in some locus SPG4-linked HSP 
families; this phenomenon has, subsequently, been associated with the expansion of a 
(CAG)n repeat demonstrated in 6 Danish families (Nielsen et al., 1997) using the RED 
(for Rapid Expansion Detection) technique. It has, however, never been possible to 
confirm this expansion in any of the families tested by this method or by the systematic 
search for sequences of (CAG)n type in physical maps composed of YAC (for Yeast 
Artificial Chromosome) or BAC (for Bacterial Artificial Chromosome) clones (Hazan et 
al., Genomics, 60 (3), 309-19, 1999). 

To date, three genes responsible for two forms of X-HSP and one form of AR- 
HSP have been identified. Mutations in the gene which encodes a neuron-specific cell 
adhesion molecule, L1-CAM (for L1 Cell Adhesion Molecule), and which is located at 
Xq28 (locus SPG1) cause a complex form of HSP (Jouet et al., 1994) in which the 
spasticity is associated with a mental handicap, whereas mutations in the PLP (for 
ProteoLipid Protein) gene located at Xq21 (locus SPG2), which encodes a constitutive 
molecule of the myelin layer, cause pure and complex forms of X-HSP (Saugier-Veber, 
P. et al., 1994). More recently, mutations in the gene located at 16q24.3 (locus SPG7), 
which encodes paraplegin, a mitochondrial ATPase of the AAA (for "ATPases 
Associated with diverse cellular Activities") protein family (Confalonieri et al., 1995), 
have been associated with complex and pure forms of AR-HSP (Casari et al., 1998). 

Thus, there remains, today, a great need to identify and characterize the gene 
responsible for the most common form of AD-HSP. The identification of this gene 
should, in particular, allow, besides the possibility of a test for antenatal screening in 
the families concerned, a better understanding of some of the molecular mechanisms 
engendering these degenerations specific for nerve bundles of the spinal cord, or even 
make it possible to provide an elementary response regarding therapeutic treatment for 
the patients. 

This is precisely the subject of the present invention. 

After having delimited the localization range between the D2S352 and D2S2347 
genetic markers by studying recombination events in locus SPG4-linked HSP families, 
the inventors have established a contig of BACs covering a physical distance evaluated 
at approximately 1.5 Mb and have undertaken a positional cloning strategy based on 



sequencing the SPG4 range in order to completely identify all the genes located in the 
candidate region. The analysis of the sequence of the two BACs, D (b336P14) and 
G (B763N4), has revealed the presence of a gene which is composed of 17 exons, 
extending over a distance of approximately 100 kb, and which exhibits homology with 
the genes encoding proteins of the AAA family. Comparison of the sequence of this 
gene between the healthy and affected individuals of AD-HSP families has made it 
possible to demonstrate various mutations in the patients. 

A subject of the invention is thus the identification and characterization of the 
SPG4 (or SPAST) gene encoding a novel nuclear member of the AAA family, 
responsible for the most common form of AD-HSP. 

In a first aspect, a subject of the present invention is a purified or isolated 
nucleic acid of the SPG4 gene, characterized in that it comprises at least 15 
consecutive nucleotides, preferably 20, 25, 30, 35, 40, 45, 50, 75, 100 or 200 
consecutive nucleotides, of a sequence chosen from the group comprising: 

- the sequence SEQ !D No. 1, which is a genomic sequence of the human SPG4 gene; 

- the nucleic acid sequences which are homologs or variants of the nucleic acid of 
sequence SEQ ID No. 1; 

- the sequence which is complementary thereto; and 

- the sequence of the corresponding RNA thereof. 

The present invention relates, of course, to both the DNA and RNA sequences, 
and also the sequences which hybridize with them, as well as the corresponding 
double-stranded DNAs. 

The terms "nucleic acid", "nucleic acid sequence" or "sequence of nucleic acid", 
"polynucleotide", "oligonucleotide", "polynucleotide sequence", and "nucleotide 
sequence", which will be used equally in the present description, will be intended to 
refer to both a double-stranded DNA, a single-stranded DNA and products of 
transcription of said DNAs, and/or an RNA fragment, said isolated natural, or synthetic 
fragments which may or may not include unnatural nucleotides, referring to a precise 
series of nucleotides, which may or may not be modified, making it possible to define a 
fragment or a region of a nucleic acid. The expression "natural isolated, or synthetic 
DNA and/or RNA fragment, which may or may not include unnatural nucleotides" is 
intended to mean a precise series of nucleotides, which may or may not be modified, 
making it possible to define a fragment, a segment or a region of a nucleic acid. 

It should be understood that the present invention does not relate to the 
genomic nucleotide sequences in their natural chromosomal environment, i.e. in the 



natural state. It involves sequences which have been isolated and/or purified, i.e. they 
have been removed directly or indirectly, for example by copying, their environment 
having been at least partially modified. 

The term "homologous nucleic acid sequence" is intended to refer to the 
sequences which have, with respect to the reference nucleic acid sequence, certain 
modifications, such as in particular a deletion, a truncation, an extension, a chimeric 
fusion and/or a mutation, in particular a point mutation, and the nucleic acid sequence 
of which shows at least 80%, preferably 90% or 95%, identity after alignment, with the 
reference nucleic acid sequence. 

For the purpose of the present invention, the term "percentage of identity" 
between two nucleic acid or amino acid sequences is intended to refer to a percentage 
of nucleotides or of amino acid residues which are identical between the two 
sequences to be compared, obtained after the best alignment, this percentage being 
purely statistical and the differences between the two sequences being distributed 
randomly and throughout their length. Sequence comparisons between two nucleic 
acid or amino acid sequences are traditionally carried out by comparing these 
sequences after having optimally aligned them, said comparison being carried out by 
segment or by "window of comparison" in order to identify and compare local regions of 
sequence similarity. The optimal alignment of the sequences for comparison can be 
produced, besides manually, by means of the local homology algorithm of Smith and 
Waterman (1981) [Ad. App. Math. 2:482], by means of the local homology algorithm of 
Neddleman and Wunsch (1970) [J. Mol. Biol. 48:443], by means of the similarity search 
method of Pearson and Lipman (1988) [Proc. Natl. Acad. Sci. USA 85:2444], and by 
means of computer programs using these algorithms (GAP, BESTFIT, FASTA and 
TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 
Science Dr., Madison, Wl, or with the BLAST N or BLAST P comparison programs). 

The percentage of identity between two nucleic acid or amino acid sequences is 
determined by comparing these two optimally aligned sequences by window of 
comparison in which the region of the nucleic acid or amino acid sequence to be 
compared can comprise additions or deletions with respect to the reference sequence 
for optima! alignment between these two sequences. The percentage of identity is 
calculated by determining the number of identical positions for which the nucleotide or 
the amino acid residue is identical between the two sequences, dividing this number of 
identical positions by the total number of positions in the window of comparison and 
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multiplying the result obtained by 100 so as to obtain the percentage of identity 
between these two sequences. 

For example, the BLAST program "BLAST 2 sequences" (Tatusova et al., "Blast 
2 sequences - a new tool for comparing protein and nucleotide sequences", FEMS 
5 Microbiol. Lett 174:247-250), available on the site 
http://www.ncbi.nlm.nih.gov/qorf/bl2.html . may be used, the parameters used being 
those given by default (in particular for the parameters "open gap penalty" : 5, and 
"extension gap penalty" : 2; the matrix chosen being, for example, the "BLOSUM 62" 
matrix proposed by the program), the percentage of identity between the two 

10 sequences to be compared being calculated directly by the program. 

It preferably involves sequences for which the complementary sequences are 
capable of hybridizing specifically with one of the sequences of the invention. 
Preferably, the specific or high stringency hybridization conditions will be such that they 
ensure at least 80%, preferably 90% or 95%, identity after alignment between one of 

15 the two sequences and the sequence which is complementary to the other. 

Hybridization under high stringency conditions means that the temperature and 
ionic strength conditions are chosen such that they allow the hybridization between two 
complementary DNA fragments to be maintained. By way of illustration, high stringency 
conditions of the hybridization step for the purposes of defining the polynucleotide 

20 fragments described above are advantageously as follows. 

The DNA-DNA or DNA-RNA hybridization is carried out in two steps: 
(1) prehybridization at 42°C for 3 hours in phosphate buffer (20 mM, pH 7.5) containing 
5 x SSC (1 x SSC corresponds to a 0.15 M NaCI + 0.015 M sodium citrate solution), 
50% of formamide, 7% of sodium dodecyl sulfate (SDS), 10 x Denhardt's, 5% of 

25 dextran sulfate and 1% of salmon sperm DNA; (2) actual hybridization for 20 hours at a 
temperature dependent on the size of the probe (i.e. 42°C for a probe of size > 100 
nucleotides), followed by two 20-minute washes at 20°C in 2 x SSC + 2% SDS and one 
20-minute wash at 20°C in 0.1 x SSC + 0.1% SDS. The final wash is carried out in 
0.1 x SSC + 0.1% SDS for 30 minutes at 60°C for a probe of size > 100 nucleotides. 

30 The high stringency hybridization conditions described above for a polynucleotide of 
defined size will be adjusted by those skilled in the art for oligonucleotides of greater or 
smaller size, according to the teaching of Sambrook et al., 1989. 

The term "nucleic acid sequence which is a variant" or "nucleic acid which is a 
variant" of a reference nucleic acid sequence will be intended to refer to the set of 

35 nucleic acid sequences corresponding to allelic variants, i.e. individual variations of the 



reference nucleic acid sequence. These natural mutated sequences correspond to 
polymorphisms present in mammals, in particular in human beings, and in particular to 
polymorphisms which can cause a pathology to occur and/or to develop. 

While the sequences according to the invention relate to normal sequences, 
they also relate to sequences which are mutated insofar as they include at least one 
point mutation, and preferably at most 10% of mutations, with respect to the normal 
sequence. 

In particular, the variant nucleic acid sequences will comprise any sequence of 
at least 15 consecutive nucleotides, preferably 20, 25, 30, 50, 100 or 200 consecutive 
nucleotides, of a polymorphic sequence of the genomic sequence of the human SPG4 
gene of sequence SEQ ID No. 1, and the nucleic acid sequence of which has, with 
respect to the sequence SEQ ID No. 1, at least one mutation corresponding in 
particular to a truncation, deletion, substitution and/or addition of an amino acid 
residue. In the present case, the variant nucleic acid sequences having at least one 
mutation will herein be linked to the pathologies of AD-HSP type linked to SPG4 locus. 

Preferably, the present invention relates to the mutated nucleic acid sequences 
in which the mutations produce a modification of the amino acid sequence of the 
polypeptide encoded by the normal sequence. 

The term "variant nucleic acid sequences" will also be intended to refer to any 
RNA or cDNA resulting from a mutation of a splice site of the genomic nucleic acid 
sequence SEQ ID No. 1. 

Preferably, the invention relates to a purified or isolated nucleic acid of the 
SPG4 gene according to the invention, characterized in that it comprises a sequence 
chosen from the group comprising: 

a) the sequence SEQ ID No. 1, the sequence SEQ ID No. 2, the sequence SEQ ID 
No. 72, the sequence SEQ ID No. 106 or the sequence of at least 15, preferably 20, 
25, 30, 35, 40, 45, 50, 75, 100 or 200, consecutive nucleotides of the sequence 
SEQ ID No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; 

b) the nucleic acid sequences which are homologs or variants of the sequences SEQ 
ID No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; and 

c) the complementary sequence or the RNA sequence corresponding to the 
sequences as defined in a) and b), 

preferably with the exception of the nucleic acid identified in the GenBank database 
under the accession number AB029006. 



The nucleic acid the sequence of which is disclosed in the GenBank database 
under the accession number AB029006 corresponds to the sequence of one of the 
100 cDNAs derived from a human brain mRNA library identified by the Kazusa DNA 
Research Institute in Japan (Kikuno et al., DNA Resarch, 6, 197-205, 1999). 

Preferably, the invention relates to a purified or isolated nucleic acid according 
to the invention, characterized in that it comprises at least one sequence of at least 15 
consecutive nucleotides, preferably 20, 25, 30, 50 or 75 consecutive nucleotides, of the 
nt 714-809, ends inclusive, fragment of the sequence SEQ ID No. 2, of the sequence 
complementary thereto or of the sequence of the corresponding RNA thereof. 

The invention preferably relates to a purified or isolated nucleic acid according 
to the present invention, characterized in that it comprises a sequence chosen from the 
following group: 

- the sequence SEQ ID No. 1; 

- the sequence SEQ ID No. 2, which is the cDNA sequence encoding human spastin; 

- the sequences SEQ ID No. 72 and SEQ ID No. 106, the sequence SEQ ID No. 72 
representing the sequence of the incomplete cDNA encoding murine spastin 
represented in Figure 5, "mouse" line, and the SEQ ID No. 106 representing the 
complete sequence thereof; 

- the nucleic acid sequences which are homologs or variants of the sequences SEQ ID 
No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; 

- the sequence complementary thereto; and 

- the sequence of the corresponding RNA thereof. 

Preferably, the invention relates to a purified or isolated nucleic acid according 
to the invention, characterized in that it comprises at least one mutation which 
corresponds to a natural polymorphism in humans, in particular the position and nature 
of which are identified in Table 5. 

The primers or probes, characterized in that they comprise a sequence of a 
nucleic acid according to the invention, also form part of the invention. 

The present invention thus relates to the set of primers which can be deduced 
from the nucleotide sequences of the invention and which may make it possible to 
demonstrate said nucleotide sequences of the invention, in particular the mutated 
sequences, using in particular an amplification method such as the PCR method, or a 
related method. 

The present invention also relates to the set of probes which can be deduced 
from the nucleotide sequences of the invention, in particular from the sequences 



capable of hybridizing with them, and which may make it possible to demonstrate said 
nucleotide sequences, in particular to distinguish the normal sequences from the 
mutated sequences. 

The present invention relates, in particular, to the probes or primers having 
sequences chosen from the sequences SEQ ID No. 4 to SEQ ID No. 71. 

The invention also relates to the use of a nucleic acid sequence according to 
the invention as a probe or primer, for detecting, identifying, assaying or amplifying a 
nucleic acid sequence. 

According to the invention, the polynucleotides which can be used as a probe or 
as a primer in processes for detecting, identifying, assaying or amplifying a nucleic acid 
sequence will have a minimum size of 15 bases, preferably of 20 bases, or better still 
of 25 to 30 bases. 

The set of probes and primers according to the invention may be labeled 
directly or indirectly with a radioactive or nonradioactive compound, using methods well 
known to those skilled in the art, in order to obtain a detectable and/or quantifiable 
signal. 

The nonlabeled polynucleotide sequences according to the invention can be 
used directly as a probe or primer. 

The sequences are generally labeled so as to obtain sequences which can be 
used for many applications. The labeling of the primers or of the probes according to 
the invention is carried out with radioactive elements or with nonradioactive molecules. 

Among the radioactive isotopes used, mention may be made of 32 P, 33 P, 35 S, 3 H 
or 125 l. The nonradioactive entities are selected from ligands, such as biotin, avidin or 
streptavidin, dioxygenin, haptens, colorants and luminescent agents, such as 
radioluminescent, chemiluminescent, bioluminescent, fluorescent or phosphorescent 
agents. 

The polynucleotides according to the invention can thus be used as a primer 
and/or probe in processes using, in particular, the PCR (polymerase chain reaction) 
technique (Erlich, 1989; Innis et al., 1990, and Rolfs et al., 1991). This technique 
requires choosing pairs of oligonucleotide primers framing the fragment which must be 
amplified. Reference may, for example, be made to the technique described in 
American patent US No. 4,683,202. The amplified fragments can be identified, for 
example after agarose or polyacrylamide gel electrophoresis, or after a 
chromatographic technique such as gel filtration or ion exchange chromatography, and 
then sequenced. The specificity of amplification can be controlled using, as a primer, 



9 



the nucleotide sequences of polynucleotides of the invention and, as a matrix, plasmids 
containing these sequences or the derived amplification products. The amplified 
nucleotide fragments can be used as reagents in hybridization reactions in order to 
demonstrate the presence, in a biological sample, of a target nucleic acid having a 
5 sequence complementary to that of said amplified nucleotide fragments. 

The invention is also directed toward the nucleic acids which can be obtained 
by amplification using primers according to the invention. 

Other techniques for amplifying the target nucleic acid can be advantageously 
employed as an alternative to PCR (PCR-like), using pairs of primers having nucleotide 

10 sequences according to the invention. The term "PCR-like" will be intended to refer to 
all methods using direct or indirect reproductions of nucleic acid sequences, or in which 
the labeling systems have been amplified. These techniques are, of course, known. In 
general, they involve amplifying the DNA with a polymerase; when the sample of origin 
is an RNA, it is advisable to perform reverse transcription beforehand. There are, 

15 currently, a great many processes which enable this amplification, such as for example 
the SDA (Strand Displacement Amplification) technique (Walker et al., 1992), the TAS 
(Transcription-based Amplification System) technique described by Kwoh et al. in 
1989, the 3SR (Self-Sustained Sequence Replication) technique described by Guatelli 
et al. in 1990, the NASBA (Nucleic Acid Sequence Based Amplification) technique 

20 described by Kievitis et al. in 1991, the TMA (Transcription Mediated Amplification) 
technique, the LCR (Ligase Chain Reaction) technique described by Landegren et al. 
in 1988 and improved by Barany et al. in 1991, which uses a heat-stable ligase, the 
RCR (Repair Chain Reaction) technique described by Segev in 1992, the CPR (Cycling 
Probe Reaction) technique described by Duck et al. in 1990, and the Q-beta-replicase 

25 amplification technique described by Miele et ai. in 1983 and improved, in particular, by 
Chu et al. in 1986 and Lizardi et al. in 1988, and then by Burg et al., and also by Stone 
etal., in 1996. 

When the target polynucleotide to be detected is an mRNA, use will 
advantageously be made, prior to carrying out an amplification reaction using the 
30 primers according to the invention or carrying out a detection process using the probes 
of the invention, of an enzyme of reverse transcriptase type in order to obtain a cDNA 
from the mRNA contained in the biological sample. The cDNA obtained will then serve 
as a target for the primers or probes used in the amplification or detection process 
according to the invention. 
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The probe hybridization technique can be carried out in diverse ways (Matthews 
et al., 1988). The most general method consists in immobilizing the nucleic acid 
extracted from the cells of various tissues or from cells in culture, on a support (such as 
nitrocellulose, nylon or polystyrene), and in incubating the immobilized target nucleic 
acid with the probe, under well defined conditions. After hybridization, the excess probe 
is eliminated and the hybrid molecules formed are detected using the appropriate 
method (measurement of the radioactivity, of the fluorescence or of the enzymatic 
activity linked to the probe). 

According to another embodiment of the nucleic acid probes according to the 
invention, the latter can be used as a capture probe. In this case, a probe, termed 
"capture probe", is immobilized on a support and is used to capture, by specific 
hybridization, the target nucleic acid obtained from the biological sample to be tested, 
and the target nucleic acid is then detected using a second probe, termed "detection 
probe", labeled with an easily detectable element. 

The splice acceptor or donor site sequences according to the present invention 
identified in Table 3 (sequences SEQ ID No. 74 to SEQ ID No. 105) also form part of 
the present invention. 

In another aspect, the invention comprises a method for screening cDNA or 
genomic DNA libraries, or for cloning isolated genomic or cDNA encoding spastin, 
characterized in that it uses a nucleic acid sequence according to the invention. 

Among these methods, mention may be made in particular of : 
- the screening of cDNA libraries and the cloning of the isolated cDNAs (Sambrook et 
al., 1989; Suggs et al., 1981; Woo et al., 1979), using the nucleic acid sequences 
according to the invention; 
-the screening of genomic libraries, for example of BACs (Chumakov et al., 1992; 
Chumakov et al., 1995), and, optionally, a genetic analysis by FISH (Cherif et al., 
1990), using sequences according to the invention, enabling the isolation and 
chromosomal localization, and then the complete sequencing, of the SPG4 gene 
encoding spastin. 

In particular, these methods according to the invention may be used for 
identifying and thus obtaining the genomic sequence or the cDNA of the SPG4 gene in 
other mammals, in particular mice. 

These screening and/or cloning methods will comprise, in particular, a step of 
hybridization of a nucleic acid according to the invention with a nucleic acid contained 
in a genomic or cDNA library. 
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The invention also comprises a method for identifying the nucleic acid 
sequences which promote and/or regulate the expression of the SPG4 gene of 
sequence SEQ ID No. 1, characterized in that it uses a nucleic acid according to the 
invention. 

5 The computer tools available to those skilled in the art enable them to easily 

identify, using the genomic nucleic acid sequences according to the invention, the 
promoter regulatory boxes required and sufficient for controlling gene expression, in 
particular the TATA, CCAAT and GC boxes, and also the stimulatory regulatory 
sequences ("enhancers"), or inhibitory regulatory sequences ("silencers"), which 

10 control, in CIS, the expression of the genes according to the invention; among these 
regulatory sequences, mention should be made of IRE, MRE and CRE. 

The invention also relates to the methods for identifying mutations carried by 
the human SPG4 gene, in particular mutations responsible for autosomal dominant 
hereditary spastic paraplegia, characterized in that they use a nucleic acid sequence 

15 according to the invention. 

These methods for identifying these mutations will, in particular, comprise the 
following steps: (i) isolation of the DNA from the biological sample to be analyzed, or 
production of a cDNA from the mRNA of the biological sample; (ii) specific amplification 
of the target DNA likely to have a mutation, using primers according to the invention; 

20 (iii) analysis of the amplification products, in particular the size and/or the sequence of 
the amplification products, with respect to a reference sequence. 

The expression "methods for identifying a mutation according to the invention" 
is also intended to refer to a method which makes it possible to obtain the nucleic acid 
on which said mutation has been identified. 

25 The promoter and/or regulatory sequences of the SPG4 gene according to the 

invention having mutations which may modify the expression of the corresponding 
protein also form part of the invention. 

The nucleic acids characterized in that they can be obtained using one of the 
preceding methods according to the invention, or the nucleic acids capable of 

30 hybridizing, under high stringency conditions (homology of at least 80% between one of 
the two sequences and the sequence complementary to the other), with said nucleic 
acids, form part of the invention, especially the variant or homologous nucleic acids, in 
particular the nucleic acid sequences of allelic variants of the SPG4 gene of sequence 
SEQ ID No. 1 or of its cDNA of sequence SEQ ID No. 2, and also the genomic 

35 sequences of the homologous genes of other mammals such as mice. 
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In the present description, the term "Spg4" will be intended to refer to the 
mouse gene homologous to the human SPG4 gene. 

The use of a nucleic acid sequence according to the invention as a probe or 
primer for screening a genomic library or a cDNA of course forms part of the subject of 
5 the present invention. 

In another aspect, the invention comprises a purified or isolated polypeptide 
encoded by a nucleic acid according to the invention, preferably with the exception of 
the 584 amino acid peptide, the sequence of which is identified in the GenBank 
database under the accession number AB029006. 
10 In the present description, the term "polypeptide" will be used to refer equally to 

a protein or a peptide. 

Preferably, the present invention relates to a polypeptide according to the 
invention, characterized in that it comprises an amino acid sequence chosen from the 
following group: 

15 -the sequence SEQ ID No. 3, corresponding to human spastin encoded by the 
sequence SEQ ID No. 2 of the cDNA of the human SPG4 gene; 

- the sequence SEQ ID No. 73, corresponding to a fragment of murine spastin encoded 
by the sequence SEQ ID No. 72 of the incomplete cDNA of the mouse Spg4 gene, 
the sequence SEQ ID No. 73 is represented in Figure 4A, "SPAST_MOUSE" line; 

20 -the sequence SEQ ID No. 107, corresponding to murine spastin encoded by the 
sequence SEQ ID No. 106 of the complete cDNA of the mouse Spg4 gene; 

- the sequences of polypeptides which are homologs and variants of the polypeptide of 
sequence SEQ ID No. 3, SEQ ID No. 73 or SEQ ID No. 107; and 

- the sequences of the fragments thereof of at least 8, 10, 15, 30 or 50 consecutive 
25 amino acids. 

Also preferably, a subject of the invention is a polypeptide according to the 
invention, characterized in that it comprises an amino acid sequence chosen from the 
group comprising: 

a) the sequence SEQ ID No. 3, the sequence SEQ ID No. 73, the sequence SEQ ID 
30 No. 107 or the sequence of at least 10 consecutive amino acids of one of these 

sequences; and 

b) the sequences which are homoiogs or variants of the sequences SEQ ID No. 3, 
SEQ ID No. 73 or SEQ ID No. 107. 

Also preferably, a subject of the invention is a polypeptide according to the 
35 invention, characterized in that it comprises the sequence of at least 8, preferably of at 
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least 10, 15, 20 or 30, consecutive amino acids of the sequence of the aa 197-228, 
ends inclusive, fragment of the sequence SEQ ID No. 3. 

Also preferably, a subject of the invention is a polypeptide according to the 
invention, characterized in that it comprises an amino acid sequence chosen from the 
5 following group: 

- the sequence SEQ ID No. 3, the sequence SEQ ID No. 73 and the sequence SEQ ID 
No. 107, which sequences carrying at least one of the mutations corresponding to a 
natural polymorphism in humans, in particular those the nature and location of which 
are identified in Table 5 hereinafter, or those which may be identified using the 

10 methods for identifying mutations of the SPG4 gene, according to the present 
invention; and 

- the sequences of the fragments thereof of at least 8, 10, 15, 30 or 50 consecutive 
amino acids. 

It should be understood that the invention does not relate to polypeptides in 
15 natural form, i.e. they are not taken in their environment. Specifically, the invention 
relates to the peptides which are obtained by purification from natural sources, or 
obtained by genetic recombination or by chemical synthesis, and which can therefore 
include unnatural amino acids. The production of a recombinant polypeptide, which can 
be carried out using one of the nucleotide sequences according to the invention, is 
20 particularly advantageous since it makes it possible to obtain an increased degree of 
purity of the desired polypeptide. 

The term "homologous polypeptide" will be intended to refer to the polypeptides 
which have certain modifications with respect to the reference polypeptide, such as in 
particular one or more deletions or truncations, an extension, a chimeric fusion and/or 
25 one or more substitutions, and the amino acid sequence of which shows at least 80%, 
preferably 90% or 95%, identity after alignment, with the reference amino acid 
sequence. 

The term "variant polypeptide" (or protein variant) will be intended to refer to the 
set of polypeptides encoded by the variant nucleic acid sequences as defined above. 
30 In particular, the variant polypeptides will comprise any polypeptide which is 

encoded by the mutated genomic sequence of the SPG4 gene of sequence SEQ ID 
No. 1, and the amino acid sequence of which has at least one mutation corresponding 
in particular to a truncation, deletion, substitution and/or addition of amino acid residues 
with respect to the sequence SEQ ID No. 3. In the present case, the variant 
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polypeptides having at least one mutation will be linked to the pathologies of AD-HSP 
type. 

The term "variant polypeptide" will also be intended to refer to any polypeptide 
resulting from mutation of a splice site in the genomic nucleic acid sequence SEQ ID 
5 No. 1. 

The invention also comprises the cloning and/or expression vectors containing 
a nucleic acid sequence according to the invention. 

The vectors according to the invention, characterized in that they include the 
elements which allow the expression and/or the secretion of said sequences in a host 
10 cell, or a cellular addressing sequence, also form part of the invention. 

The vectors characterized in that they include a promoter and/or regulator 
sequence according to the invention also form part of the invention. 

Said vectors will preferably include a promoter, translation initiation and 
termination signals, and also suitable regions for regulating the transcription. They 
15 should be able to be maintained stably in the cell and can, optionally, have particular 
signals which specify secretion of the translated protein. 

These various control signals are chosen as a function of the host cell used. To 
this effect, the nucleic acid sequences according to the invention can be inserted into 
vectors which replicate autonomously in the host chosen, or vectors which integrate in 
20 the host chosen. 

Among the systems which replicate autonomously, use will preferably be made, 
as a function of the host cell, of the systems of plasmid or viral type, the viral vectors 
possibly in particular being adenoviruses (Perricaudet et al., 1992), retroviruses, 
Antiviruses, poxviruses or herpesviruses (Epstein et al., 1992). Those skilled in the art 
25 know the technology which can be used for each of these systems. 

When integration of the sequence into the chromosomes of the host cell is 
desired, use may be made, for example, of the systems of plasmid or viral type; such 
viruses will, for example, be retroviruses (Temin, 1986), or AAVs (Carter, 1993). 

Among the nonviral vectors, preference is given to naked polynucleotides such 
30 as naked DNA or naked RNA according to the technique developed by the company 
VICAL, yeast artificial chromosomes (YAC) for expression in yeast, mouse artificial 
chromosomes (MAC) for expression in murine cells and, preferably, human artificial 
chromosomes (HAC) for expression in human cells. 

Such vectors will be prepared according to the methods commonly used by 
35 those skilled in the art, and the clones resulting therefrom can be introduced into a 
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suitable host using standard methods, such as for example lipofection, electroporation 
or heat shock. 

The invention also comprises the host cells, in particular the eukaryotic and 
prokaryotic cells, transformed with the vectors according to the invention, and also the 
5 transgenic animals, except humans, comprising one of said transformed cells 
according to the invention. 

Among the cells which can be used for these purposes, mention may of course 
be made of bacterial cells (Olins and Lee, 1993), but also yeast cells (Buckholz, 1993), 
as well as animal cells, in particular cultures of mammalian cells (Edwards and Aruffo, 
10 1993), and especially Chinese hamster ovary (CHO) cells, but also insect cells in which 
it is possible to use processes implementing baculoviruses, for example (Luckow, 
1993). A preferred cellular host for expressing the proteins of the invention consists of 
CHO cells. 

Among the mammals according to the invention, preference will be given to 
15 animals such as mice, rats or rabbits, expressing a polypeptide according to the 
invention. 

Among the mammals according to the invention, preference will also be given to 
those comprising a transformed cell characterized in that the sequence of at least one 
of the two alleles of the SPG4 gene contains at least one of the mutations 
20 corresponding to a natural polymorphism in humans, in particular those the nature and 
location of which are identified in Table 5 hereinafter, or those which may be identified 
using the methods for identifying a mutation of the SPG4 gene, according to the 
present invention. 

Among the mammals according to the invention, preference will also be given to 
25 animals such as mice, rats or rabbits, characterized in that the gene encoding spastin 
according to the invention is not functional or is knocked out. 

Among the animal models more particularly advantageous herein, there are, in 
particular: 

- the transgenic animals having, at least in one of their two allelic sequences of the 
30 SPG4 gene, at least one of the mutations the position and nature of which are 
identified in Table 5 or identified using a method according to the present invention. 
These transgenic animals are obtained, for example, by homologous recombination 
on embryonic stem cells, transfer of these stem cells to embryos, selection of the 
chimeras affected in the reproductive lines, and growth of said chimeras; 
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- the transgenic animals (preferably mice) overexpressing the SPG4 gene into which 
one of said mutations according to the invention may be introduced. The mice are 
obtained, for example, by transfection of a copy of this gene under the control of a 
strong promoter which is ubiquitous in nature or selective for a tissue type, or after 

5 viral transcription; 

- the transgenic animals (preferably mice) made deficient for the SPG4 gene according 
to the invention by inactivation using the LOXP/CRE recombinase system (Rohlmann 
et al., 1996) or any other system for inactivating the expression of this gene. 

The cells and mammals according to the invention can be used in a method for 

10 producing a polypeptide according to the invention, as described below, and can also 
be used as a model for analysis and for DNA (genomic or cDNA) library screening. 

The transformed cells or mammals as described above can thus be used as 
models in order to study the interactions between the polypeptides according to the 
invention, and chemical or protein compounds, which are involved directly or indirectly 

15 in the activities of the polypeptides according to the invention, this being in order to 
study the various mechanisms and interactions which come into play. 

They can especially be used for selecting products which interact with the 
polypeptides according to the invention, in particular human spastin of sequence SEQ 
ID No. 3 or the variants thereof according to the invention, as a cofactor or as an 

20 inhibitor, in particular a competitive inhibitor, or which have agonist or antagonist 
activity for the activity of the polypeptides according to the invention. Preferably, said 
transformed cells or transgenic animals will be used as a model which, in particular, 
enables the selection of products which make it possible to combat the pathology 
linked to the SPG4 gene mentioned above. 

25 The invention also relates to the use of a cell, of a mammal or of a polypeptide 

according to the invention for screening a chemical or biochemical compound which 
can interact directly or indirectly with the polypeptides according to the invention, 
and/or which is capable of modulating the expression or the activity of these 
polypeptides. 

30 The invention also relates to the use of a nucleic acid sequence according to 

the invention for synthesizing recombinant polypeptides. 

The method for producing a polypeptide of the invention in recombinant form is, 
itself, included in the present invention, and is characterized in that the transformed 
cells, in particular the cells or mammals of the present invention, are cultured under 

35 conditions which allow the expression of a recombinant polypeptide encoded by a 
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nucleic acid sequence according to the invention, and in that said recombinant 
polypeptide is recovered. 

The recombinant polypeptides, characterized in that they can be obtained using 
said production method, also form part of the invention. 
5 The recombinant polypeptides obtained as indicated above can be in both 

glycosylated and nonglycosylated form and may or may not have the natural tertiary 
structure. 

These polypeptides can be produced based on the nucleic acid sequences 
defined above, according to the techniques for producing recombinant polypeptides 
10 known to those skilled in the art. In this case, the nucleic acid sequence used is placed 
under the control of signals which allow its expression in a cellular host. 

An effective system for producing a recombinant polypeptide requires a vector 
and a host cell according to the invention. 

These ceils can be obtained by introducing into host cells a nucleotide 
15 sequences inserted into a vector as defined above, and then culturing said cells under 
conditions which allow the replication and/or expression of the transfected nucleotide 
sequence. 

The processes for purifying a recombinant polypeptide which are used are 
known to those skilled in the art. The recombinant polypeptide can be purified from cell 
20 lyzates and extracts and/or from the culture medium supernatant, with methods used 
individually or in combination, such as fractionation, chromotography methods, 
immunoaffinity techniques using specific monoclonal or polyclonal antibodies, etc. 

The polypeptides according to the present invention can be obtained by 
chemical synthesis, this using one of the many known peptide syntheses, for example 
25 the techniques which implement solid phases or techniques which use partial solid 
phases, by condensation of fragments or by conventional synthesis in solution. 

The solid-phase synthesis technique is well known to those skilled in the art. 
See in particular Stewart et al. (1984) and Bodansky (1984). 

The polypeptides which are obtained by chemical synthesis and which can 
30 include corresponding unnatural amino acids are also included in the invention. 

The mono- or polyclonal antibodies or their fragments, chimeric antibodies or 
immunoconjugates, characterized in that they are capable of specifically recognizing a 
polypeptide according to the invention, form part of the invention. 

Specific polyclonal antibodies can be obtained from a serum of an animal 
35 immunized against the polypeptides according to the invention, in particular produced 
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by genetic recombination or by peptide synthesis, according to conventional 
procedures. 

The advantage of antibodies which specifically recognize certain polypeptides, 
variants or immunogenic fragments thereof, according to the invention, will in particular 
5 be noted. 

The specific monoclonal antibodies can be obtained according to the 
conventional hybridoma culture method described by Kohler and Milstein, 1975. 

The antibodies according to the invention are, for example, chimeric antibodies, 
humanized antibodies, or Fab or F(ab') 2 fragments. They can also be in the form of 
10 labeled antibodies or immunoconjugates in order to obtain a detectable and/or 
quantifiable signal. 

The invention also relates to methods for detecting and/or purifying a 
polypeptide according to the invention, characterized in that they use an antibody 
according to the invention. 

15 The invention also comprises purified polypeptides, characterized in that they 

are obtained using a method according to the invention. 

Moreover, besides their use for purifying the polypeptides, the antibodies of the 
invention, in particular the monoclonal antibodies, can also be used for detecting these 
polypeptides in a biological sample. 

20 They thus constitute a means of immunocytochemically or immuno- 

histochemically analyzing the expression of the polypeptides according to the 
invention, in particular the polypeptide of sequence SEQ ID No. 3 or a variant thereof, 
on specific tissue sections, for example by immunofluorescence or gold labeling, or 
with an enzymatic immunoconjugates. 

25 They may make it possible, in particular, to demonstrate abnormal expression 

of these polypeptides in the biological samples or tissues, which makes them useful for 
monitoring the progression of the disease and the molecular diagnosis. 

More generally, the antibodies of the invention can be advantageously used in 
any situation in which the expression of a normal or mutated polypeptide according to 

30 the invention must be observed. 

The methods for determining allelic variability, a mutation, a deletion, a loss of 
heterozygosity or any genetic abnormality of the SPG4 gene, according to the 
invention, characterized in that they use a nucleic acid sequence or an antibody 
according to the invention, also form part of the invention. 
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The present invention thus comprises a method for genotypic diagnosis of the 
pathology associated with the SPG4 gene, characterized in that a nucleic acid 
sequence according to the invention is used. 

Preferably, the invention relates to a method for genotypic diagnosis of the 
5 disease associated with the presence of at least one mutation on a sequence of the 
SPG4 gene, using a biological sample from a patient, characterized in that it includes 
the following steps: 

a) where appropriate, isolation of the genomic DNA from the biological sample to be 
analyzed, or production of cDNA from the RNA of the biological sample; 
10 b) specific amplification of said DNA sequence of the SPG4 gene likely to contain a 
mutation, using primers according to the invention; 
c) analysis of the amplification products obtained and comparison of their sequence 
with the corresponding normal sequence of the SPG4 gene. 

The invention also comprises a method for diagnosing the disease associated 
15 with abnormal expression of a polypeptide encoded by the SPG4 gene, in particular the 
polypeptide of sequence SEQ ID No. 3, characterized in that one or more antibodies 
according to the invention is (are) brought into contact with the biological material to be 
tested, under conditions which allow the possible formation of specific immunological 
complexes between said polypeptide and said antibody or antibodies, and in that the 
20 immunological complexes possibly formed are detected and/or quantified. 

These methods are, for example, directed toward the methods for diagnosis, in 
particular antenatal diagnosis, of AD-HSP associated with the presence of a mutation 
in the SPG4 gene, according to the invention, by determining, using a biological 
sample from the patient, the presence of mutations in at least one of the sequences 
25 described above. The nucleic acid sequences analyzed may equally be genomic DNA, 
cDNA or mRNA. 

Nucleic acids or antibodies based on the present invention may also be used to 
enable positive diagnosis in a patient or presymptomatic diagnosis in an individual at 
risk, in particular an individual with a family history of the disease. 
30 There are, of course, a great number of methods which make it possible to 

demonstrate a mutation in a gene with respect to the wild-type gene. They can 
essentially be divided into two main categories. The first type of method is that in which 
the presence of a mutation is detected by comparing the mutated sequence with the 
corresponding wild-type sequence, and the second type is that in which the presence 
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of the mutation is detected indirectly, for example through evidence of mismatches due 
to the presence of the mutation. 

These methods can use the probes and primers of the present invention which 
have been described. They are generally purified nucleic acid hybridization sequences 
5 comprising at least 15 nucleotides, preferably 20, 25 or 30 nucleotides, characterized in 
that they can hybridize specifically with a nucleic acid sequence according to the 
invention. 

Preferably, the specific hybridization conditions are such as those defined 
above or in the examples. The length of these nucleic acid hybridization sequences 
10 can range from 15, 20 or 30 to 200 nucleotides, particularly from 20 to 50 nucleotides. 

Among the methods for determining allelic variability, a mutation, a deletion, a 
loss of heterozygocity or a genetic abnormality, preference is given to the methods 
comprising at least one so-called PCR (polymerase chain reaction) or PCR-like 
amplification step for the target sequence according to the invention likely to have an 
15 abnormality, using a pair of primers having nucleotide sequences according to the 
invention. The amplified products may be treated with a suitable restriction enzyme 
before carrying out the detection and assaying of the product targeted. 

The mutations of the SPG4 gene according to the invention may be responsible 
for various modifications of the translation product thereof, these modifications possibly 
20 being used for a diagnostic approach. Specifically, the antigenicity modifications linked 
to these mutations may allow the development of specific antibodies. The mutated 
gene product can be distinguished using these methods. All these modifications can be 
employed in a diagnostic approach, using several well-known methods based on the 
use of mono- or polyclonal antibodies which recognize the normal polypeptide or 
25 mutated variants, such as for example by RIA or by ELISA. 

Thus, a subject of the invention is also a kit or pack for diagnosis, in particular 
for diagnosing AD-HSP associated with the presence of a mutation in the SPG4 gene, 
according to the invention, characterized in that it comprises at least one compound 
chosen from the following group of compounds: 
30 a) a nucleic acid, in particular as a primer or probe, according to the present invention; 
and 

b) an antibody according to the invention. 

In another aspect, the invention comprises a method for selecting a chemical or 
biochemical compound capable of preventing and/or treating AD-HSP associated with 
35 the SPG4 gene, characterized in that a nucleic acid sequence according to the 
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invention, a polypeptide according to the invention, a vector according to the invention, 
a cell according to the invention, a mammal according to the invention or an antibody 
according to the invention is used. 

The methods for selecting chemical or biochemical compounds capable of 
5 interacting directly or indirectly with polypeptides according to the invention or with the 
nucleic acids according to the invention, and/or making it possible to modulate the 
expression or the activity of these polypeptides, characterized in that they comprise 
bringing a polypeptide according to the invention, a transformed cell according to the 
invention or a mammal according to the invention into contact with a candidate 

10 compound, and detecting a modification of the activity of said polypeptide, are also 
included in the invention. 

For example, but without being limited thereto, mention may be made of a 
method for identifying molecules capable of interacting with a polypeptide according to 
the invention, using a bacterial or yeast two hybrid system such as the Matchmaker 

15 Two Hybrid System 2, according to the instructions of the manual which is supplied 
with the Matchmaker Two Hybrid System 2 (Catalog No. K1 604-1, Clontech). 

The nucleic acids encoding proteins which interact with the promoter and/or 
regulatory sequences of the SPG4 gene, according to the invention, can be screened 
and/or selected using a one hybrid system such as that described in the manual which 

20 is supplied with the Matchmaker One Hybrid System kit from Clontech (Catalog No. 
K1603-). 

In other aspect, the invention comprises the use of a nucleic acid or of a 
polypeptide according to the invention, of a vector according to the invention, of a cell 
according to the invention or of a mammal according to the invention, for studying the 
25 expression or the activity of the SPG4 gene. 

Other characteristics and advantages of the invention appear in the remainder 
of the description with the examples and figures, the legends of which are given 
hereinafter. 

30 LEGENDS OF THE FIGURES 

FIGURES 1A, 1B and 1C : Physical map of the SPG4 range and genomic organization 
of SPG4. 

FIGURE 1A : The 1.5 Mb candidate region is delimited by the D2S352 and 
D2S2347 genetic markers indicated in bold characters. The position of the polymorphic 
35 markers and other STSs is indicated in standard characters, whereas the position of 
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the ESTs is indicated in italics. The BAC clones constituting the presequencing map 
are represented by rectangles, with the name shown above and the precise size of the 
clone, if it could be determined, shown below. The name of the BACs A, B, C, etc. is 
followed by brackets containing the name of the clone preceded by a "b" if the clone is 
5 derived from the BACs library CITB_978_SKB, or by a "B" if it originates from the 
library RPCI-11. 

FIGURE 1B : Schematic representation of the SPG4 gene which overlaps 
BACs D (b336P14) and G (B563N4). The exons are shown as black rectangles with 
their name above. 

10 FIGURE 1C : The five mutations identified in seven SPG4 locus-linked AD-HSP 

families are positioned in exons 7, 11 and 13 and in the splice acceptor site of intron 
15. 

FIGURE 2 : Nucleic acid and protein sequence of the SPG4 cDNA of spastin. 

The 17 vertical bars with a number located below represent the junctions 

15 between the various exons. The ATG initiator codon is located at nt position 126-128 
and the STOP codon for termination is located at nt position 1974-1976. Five of the 
mutations identified to date, including the loss of exon 16, are indicated in italics 
(nt 1210, nt 1468, nt 1520, nt 1620 and for the loss of exon 16: nt 1813-1853). The 
polyadenylation site is in italics and underlined. The putative nuclear localization signal 

20 (NLS), RGKKK, and also the three conserved domains predicted by the analysis in the 
ProDom database are located at aa positions 7-11 (NLS), 342-409 (domain 92), 
411-509 (domain 179) and 512-599 (domain 6226), respectively. The four motifs 
predicted by the sequence comparison in the Prosite database are: two "leucine 
zipper" motifs at aa positions 50-78 and 508-529, the ATP binding site (or Walker A 

25 motif) at aa positions 382-389 and the "helix-loop-heiix" dimerization domain at aa 
positions 478-486. The Walker A and B motifs, "GPPGNGKT" and "IIFIDE", and also 
the AAA minimum consensus [lacuna] are underlined. 

FIGURES 3A and 3B : Characterization of a splice site mutation in the affected 
individuals of three SPG4 locus-linked AD-HPS families. 
30 FIGURE 3A : PCR amplification of fragment IV of the SPG4 cDNA using 

lymphoblast cDNA: well M, size marker VII (Boehringer); well 1, unaffected member of 
family 2992; well 2, patient of family 2992; well 3, unaffected member of family 5330; 
well 4, patient of family 5330; well 5, patient of family 5226; well 6, negative control 
(human genomic DNA). 
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FIGURE 3B : Sequence graph for the mutation of the splice acceptor site of 
intron 15. 

Genomic sequence of the control individual above and of a patient of family 
2992 below. The asterisk at nt position 1813-4 indicates an A->C polymorphism which 
5 affects a nonconserved nucleotide of the splice acceptor site of intron 15 in the patient. 
FIGURES 4A and 4B : Spastin homologies. 

The identical residues are highlighted by shaded areas. 

FIGURE 4A : Multiple alignment created by CLUSTAL W of eight proteins 
derived from various organisms and having strong sequence homology with human 
10 spastin and murine spastin (SEQ ID No. 73). 

FIGURE 4B : Alignement by CLUSTAL W of the yeast metalloproteases AFG3, 
RCA1 and YME1, and of human plaraplegin and spastin. 

FIGURE 5: Alignment by BLASTN of the nucleic acid sequences of the SPG4 cDNA 
and of its mouse ortholog Spg4 (SEQ ID No. 72). The polyadenylation site of the 
15 murine cDNA is underlined and in italics. The STOP codon is located at nt position 
1515-1517 in the murine cDNA and at nt position 1974-1976 in the human cDNA. 
FIGURES 6A, 6B and 6C : PCR analysis of the expression of SPG4 and of its murine 
ortholog Spg4. 

FIGURE 6A : Collection of cDNA originating from multiple mouse tissues. 
20 Well M, size marker V (Boehringer); well 1, heart, well 2, brain; well 3, spleen; 

well 4, lung; well 5, liver; well 6, skeletal muscle; well 7, kidney; well 8, testicle; well 9, 
E7 7-day embryo; well 10, E11 11-day embryo; well 11, E15 15-day embryo; well 12, 
E17 17-day embryo; well 13, negative control (mouse genomic DNA). 

FIGURE 6B : Collection of cDNA originating from multiple human tissues. 
25 Well M, size marker VII (Boehringer); well 1, brain; well 2, heart; well 3, kidney; 

well 4, liver; well 5, lung; well 6, pancreas; well 7, placenta; well 8, skeletal muscle, 
well 9, negative control (human genomic DNA); well 10, negative control (no DNA). 

FIGURE 6C : Collection of cDNA originating from multiple human fetal tissues. 
Well M, size marker VII (Boehringer); well 1, brain; well 2, heart; well 3, kidney; 
30 well 4, liver; well 5, lung; well 6, skeletal muscle; well 7, spleen; well 8, thymus; well 9, 
negative control (human genomic DNA); well 10, negative control (no DNA). 
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EXAMPLES 

Example 1: Materials and methods 

1 ) Subcloning and sequencing of the candidate region 

Twelve BACs originating from two human genomic libraries, CITB_978_SKB 
5 (sold by Research Genetics) and RPCI-11 (Osoegawa et al., 1998), and covering the 
SPG4 range, were selected to be sequenced (Hazan et al., Genomics, 60 (3), 309-19, 
1999). 40 ug of the DNA of each BAC were partially digested with the CviJI restriction 
enzyme (CHIMERx) and separated by electrophoresis on 0.4% LMP agarose gel 
(FMC). DNA fractions, the sizes of which vary in the region of 3, 5 and 10 kb, were 

10 eluted with p-agarase (Biolabs) and ligated to a plasmid vector pBAM3, which had 
been digested with Smal and dephosphorylated, beforehand, in a ratio of 1 x insert per 
5 x vector. Electrocompetent E. coli DH10B bacteria (GIBCO-BRL) were transformed 
with the various ligations, by electroporation. Approximately 1 000 to 1 500 subclones 
per BAC (8 to 10 equivalent genomes), consisting of 20% of clones with inserts at 

15 10 kb, 40% of clones with inserts at 5 kb and 40% of clones with inserts at 3 kb, were 
isolated. The ends of the inserts of these clones were sequenced on a LICOR 4200 
automatic sequencer. For each BAC, the sequences were assembled into a backbone 
consisting of several contigs, using the Phred and Phrap programs. The holes between 
each contig were sequenced with labeled dideoxynucleotides on an ABI 377 sequencer 

20 (PE-Applied Biosystems). The exons contained in these sequence contigs were 
predicted with the GRAIL II, GENSCAN, FGENEH and Genie computer programs. The 
sequences were also compared in the EMBL and GenBank nucleic acid and protein 
databases, with the BLASTN and BLASTX programs. The determination of the 
promoter sequences was carried out using the TSSG and TSSW computer programs. 

25 The results of all these sequence analyses were visualized using the Genotator 
sequence annotation program. 

2) cDNA cloning 

The cDNA of the SPG4 gene was isolated through 5' and 3' RACE-PCR 
experiments on polyA+ RNAs of fetal brain, adult brain and adult liver, using the 
30 Marathon cDNA amplification kit (Clontech) according to the supplier's instructions. A 
first PCR followed by an internal PCR were carried out with various pairs of primers, 
the sequences of which are indicated in Table 1 hereinafter: 
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Table 1 

Primers used for the RACE-PCRs and the cDNA amplifications 



Primer Sequence (5'-3') 5' position pair/PCR product size 



SPA_ 


_5RACE5 


CGGAGCTCCTCTTGGCTGCCATG (SEQ ID No.4) 


nt 405 


SPA_ 


_5RACE6 


AGAAGCGCTGGCAGAGCCACACGAAG (SEQ ID No.5) 


nt 372 


SPA_ 


_5RACE7 


AAGGCGACCAAACGCAGCAGCGCGAAG (SEQ ID No.6) 


nt 331 


SPA_ 


_3RACE1 


AGG AG CAAGCTGTGG AATG GTATAAG (SEQ ID No.7) 


nt 550 


SPA, 


_3RACE2 


TGGTTATGGCCAAGGACCGCTTACAAC (SEQ ID No.8) 


nt 689 


SPA_ 


_3RACE3 


CAAACGGACGTCTATAATGACAGTAC (SEQ ID No. 9) 


nt 747 


SPA. 


_3RACE4 


TTAGGAATGTGGACAGCAACCTTGC (SEQ ID No. 10) 


nt 1075 


SPA_ 


_3RACE5 


CTTCTCTG AG GCCTG AGTTGTTC AC (SEQ ID No.11) 


nt 1207 


SPA_ 


_3RACE6 


TGCTAGAATGACTGATGGATACTCAGG (SEQ ID No.12) 


nt 1736 


SPA_ 


_3RACE7 


AGATGCAGCACTGGGTCCTATCCG (SEQ ID No. 13) 


nt 1787 


SPA_ 


_3RACE8 


ATGAACGTCATCGGCTACAGAAACAG (SEQ ID No.14) 


nt 2037 



SPA_ 


_Db 


TAGCAGTGGCTGCCGCCGT (SEQ ID No.15) 


nt 45 


b+m 


655 bp 


SPA, 


Dm 


AAGCGGTCCTTGGCCATAAC (SEQ ID No.16) 


nt 700 






SPA_ 


_Dc 


GGCGGCAGTGAGAGCTGTG (SEQIDNo.17) 


nt 106 


c+n 


543 bp 


SPA_ 


_Dn 


CTAGCTCTTTCACACTGTTC (SEQ ID No.18) 


nt 649 






SPA. 


.Ad 


AACAGGCCTTCGAGTACATC (SEQ ID No. 19) 


nt 487 


d+n 


746 bp 


SPA_ 


Am 


CTGTGAACAACTCAGGCCTC (SEQ ID No.20) 


nt1233 






SPA_ 


_Ac 


ATGAGAAAGCAGGACAGAAG (SEQ ID No.21) 


nt 532 






SPA_ 


.An 


TG CCAAGTCTTG ACCAGC (SEQ ID No.22) 


nt1175 






SPA_ 


_Ba 


CTACAACTGCTACTCGTAAG (SEQ ID No.23) 


nt 1036 


a+m 


763 bp 


SPA_ 


Bm 


C AGTG CTG C ATCTTTTGCC (SEQ ID No.24) 


nt 1799 






SPA_ 


_Bb 


TAGG AATGTG G AC AG C AACC (SEQ ID No.25) 


nt1076 






SPA_ 


_Bn 


AAAGCTGTTAGGTCACTTCC (SEQ ID No.26) 


nt 1780 






SPA_ 


_Ca 


TGGAGATGACAGAGTACTTG (SEQ ID No.27) 


nt 1550 


a+m 


766 bp 


SPA_ 


Cm 


CTGGAATACTTTCATCTGC (SEQ ID No.28) 


nt 2316 






SPA_ 


_Cb 


ATGAGGCTGTTCTCAGGCG (SEQ ID No.29) 


nt 1603 
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The RACE-PCR products were cloned with the TA-cloning kit (Invitrogen) and 
the corresponding clones were sequenced on an ABI 377 (PE-Applied Biosystems). 
The sequence of the SPG4 transcript was varified by sequencing PCR products 
amplified from a cDNA population originating from the lymphoblasts of 6 healthy 
5 individuals. 

3) Detection of mutations 

The total RNAs were extracted from lymphoblast lines of one affected individual 
per family studied and of 6 control individuals, using the RNA PLUSR kit (bioprobe 
System). The cDNA synthesis was carried out on 500 ng to 1 ug of RNA, with 100 pmol 

10 of random hexameric primers (Pharmacia) and 200 units of Superscript II reverse 
transcriptase (Gibco BRL), under standard conditions. Four PCR amplifications, 
generating overlapping fragments which cover ail of the SPG4 open reading frame, 
were carried out on the cDNAs of the patients and controls. Fragment I was amplified 
with the SPA_Db/SPA_Dm primers, and then by internal PCR with the 

15 SPA_Dc/SPA_Dn primers. Fragments II, III, and IV were amplified with the 
SPA_Ad/SPA_Am, SPA_Ba/SPA_Bm and SPA_Ca/SPA_Cm primers (cf. the 
sequences of these primers in Table 1 ), respectively. Each amplification was carried 
out in a total volume of 50 pi containing 4 pi of cDNA (~ 1/7th of the prep.), 20 pmol of 
each primer, 200 pM of dNTPs, 50 mM of KCI, 10 mM of Tris, pH 9, 1.5 mM MgCI 2l 

20 0.1 % of triton X-1 00, 0.01 % of gelatin and 2.5 units of Taq polymerase (Cetus-PE). The 
PCR reactions were carried out according to the "hot start" process: the Taq 
polymerase is added at 92°C, after a first denaturation step of 5 min at 94°C. The 
samples are subsequently subjected to 35 cycles of denaturation (94°C for 40 sec), of 
hybridization (55°C for 50 sec, with the exception of fragment I: 58°C for 50 sec) and of 

25 elongation (72°C for 1 min), followed by a final elongation step (5 min at 72°C). The 
PCR products are sequenced on an ABI 377 automatic sequencer (PE-Applied 
Biosystems), with the SPA_Dc/SPA_Dn, SPA_Ac/SPA_An, SPA_Bb/SPA_Bn and 
SPA_Cb/SPA_Cm primers for fragments I, II, III and IV, respectively. 

The mutations were also sought or confirmed by sequencing the 17 predicted 

30 exons of the SPG4 gene in the patients and controls. Each exon was amplified with the 
corresponding "a+m" pair of primers (cf. Table 2 hereinafter), with the exception of 
exon 1 (gSPAex1c/gSPAex1m), and exons 10, 11 and 12 which were co-amplified with 
the gSPAex10a/gSPAex12m and gSPAexl 1a/gSPAex12m pairs of primers. 
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Table 2 

PCR primers for amplifying and sequencing the exons 



Exon 


Product size 


PCR program 


Primer 


Sequence (5'-3') (SEQ ID Nos.; 30 to 71) 


1 


1048 bD 


o 


yorncA 1 L» 


r;Tn a c p p n. a a hth, p a p attp. 








yorncA Mil 


PA AAP,TPP,APAP,PTAPAnTP;P 








yor ncX 1 (J 


uvjAAU 1 o I AO 1 f OAo 1 uouA 








gor Aex i n 


A/™* ATP A rT^PTP^r* A /~» P"*T A 

AOA J bAbbU 1 GGGAGO 1 AO 


2 


P.OA hn 

Up 


o 
o 


gorAcXZa 


AA 1 otLAtAt 1 Ibl AA lOlU 








nCDA QvOm 

gor Aexzm 


TfiTVCIA ATATATPATA ATTTPPP 

1 O 1 OAA 1 A 1 A 1 OA 1 AA 1 1 1 bob 










1 AGAGGAG 1 1 G 1 OA 1 GA 1 G 


o 


O Dp 


1 


gorAexoa 


G ACCAAATTGGTG CATGCATG 








goKAexorn 


AP A ~1 — 1 — T/" 1 ^ A ATAP ATPPP A P^ 

AOA 1 t 1 OOAA 1 AOA 1 OOOAO 


A 
H 


o/y Dp 


Q 
O 


gor J Aex4a 


A 1 1 1 G 1 OA 1 1 1 O AOA 1 G OAO 








goPAex4m 


l I A O A AT/^ A OT ATA /"*/~*~r/-** A O 

1 1 AGAA 1 GAO TA lAGGTGAC 








goKAex4n 


TP* A r~* 1 1 A A /*~*T A A O A OTP~* 

1 OAOO 1 1 AAO 1 AAGAO 1 O 


O 


OoU Dp 


4 


goMAexoa 


i 1 OO 1 A 1 O 1 AOO 1 AG 1 GAO 








gorAexom 


TTT — rATAPPA a r^-] — r r*- /~* r*~r 
1 1 li A 1 AGOAAG 1 1 GOOO 1 G 








gorAexoD 


PPTATPA APATPPTPPTAP 

OO 1 A 1 bAAbA 1 OO 1 bo 1 AO 


5 




-3 
O 


gorAexua 


1 O 1 OA 1 OA 1 1 O 1 AAOAAbbb 








nSPApYfim 

yon ncAy 1 1 1 


TPTATTTPAPTPPTPiAPATfi 


7 


420 bp 


2 


gSPAex7a 


GTCATAGGGCTTAGGCTTC 








gSPAex7m 


ATCATACTACCCAC 1 1 1 1 CC 


8 


647 bp 


3 


gSPAex8a 


TGTTTGGGAAGATGCTACTG 








gSPAex8m 


CTACTGAAGATAACGTACATG 


9 


1268 bp 


1 


gSPAex9a 


C ATTG ATTG C C ATGTATTG G 








gSPAex9m 


AGAAGGCCAGAAATACTCAG 








gSPAex9b 


GTACTTAAATCG GTAAATAT GG 


101 


1061 bp 


4 


gSPAexlOa 


CTCAAGTCTTAGGAATGCAG 


11 1 






gSPAexlOb 


GCACTTAACCAGGCTGTATG 


12J 


551 bp 


3 


gSPAex11a 


CTCAGATGACTCACATAGC 








gSPAex12m 


CTTTACTAGACTAATTCTCCTG 



28 



13 




4 


yurncA i oca 


CAGATTCAAGAAGACAGATC 








yorncA I Oi 1 1 


GPAATAATTPAPPAPAPTTG 

i /A/A I 1 onwvnvnv i i \j 








yorncA ion 


G GTAGTTCTTGTTTCTGCTC 


14 




4 


n c *PApy14?i 

y*Jl AAC7A 1 *-TCl 


C AAGTGTG GTG AATTATTG C 








n^PApy14m 


G AG CTG AAAAGTATTCAGC 








yorntJA IH-I I 


TGPAAAGGAPATAGPP.APiTG 


15 


1D7R hn 




yvji aca i v>a 


AGCCTCTGGAGATAGTATGC 








gSPAex15m 


CTAGAACAGGGGTCACAGTC 








gSPAex15n 


TTGGACTTCTTAAACTTC 


16 


1404 bp 


4 


gSPAex16a 


G CAGTATG C AAG AAATTG AAC 








gSPAex16m 


GGCCTGTAA 1 1 1 ICTTCTG 








gSPAex16b 


GTACTGAATAGATACATGTAG 


17 


445 bp 


3 


gSPAex17a 


GTGTAG CAG ATC AAC ATAG 








gSPAex17m 


CATCTTCAAGTTTGGTGCAC 



Other than for exon 1, which is amplified using the Advantage GC genomic 
PCR kit (Clontech) according to the supplier's instructions, four slightly different PCR 
programs (1, 2, 3 and 4) were used to amplify the SPG4 exons (see Table 2). The 
5 amplifications were all carried out in a volume of 50 pi containing 100 ng of genomic 
DNA, 50 pmol of each primer, 250 pM pf dNTPs, 1X Takara buffer and 1 unit of Takara 
La Taq Taq polymerase (Shuzo Co.). The PCR reactions were carried out according to 
the "hot start" process: the Taq polymerase is added at 94°C, after a first denaturation 
step of 5 min at 96°C. The samples are subsequently subjected to 30 cycles of 

10 denaturation (94°C for 40 sec), of hybridization (prog. 1 : 60°C for 50 sec; prog. 2: 58°C 
for 50 sec, prog. 3 and 4: 55°C for 50 sec) and of elongation (prog. 1 and 4: 72°C for 
1 min, prog. 2 and 3: 72°C for 40 sec), followed by a final elongation step (10 min at 
72°C). The sequencing of these PCR products was carried out on an ABI 377 
sequencer (PE-Applied Biosystems), using either the PCR primers or the internal 

15 primers termed "b" and "n" (see Table 2). 
4) Characterization of SPG4 

The cDNA clones 977312 (EST AA560327) and 568234 (EST AA107866) 
derived from the mouse blastocyst and E8 embryo cDNA libraries, which both 
correspond to the murine ortholog of SPG4, were isolated using the IMAGE consortium 



29 



and sequenced in the laboratory on an ABI 377 sequencer (PE-Applied Biosystems). In 
order to analyze the expression profile of SPG4 and of its murine ortholog Spg4, the 
collections of cDNA from various fetal and adult human tissues, and also from mouse 
tissues (MTC panels, Clontech), were tested by PCR according to the supplier's 
5 protocol, with the SPA_Ca/SPA_Cm pair of primers for the human cDNAs and the 
SPA_Ca/spam (spam: 5'-ACCGAAGTCAAGAGCCTATC-3') pair for the mouse 
cDNAs. The PCR conditions are those used for amplifying SPG4 from lymphoblast line 
cDNA (cf. § Detection of mutations), except that these samples were subjected to 
32 cycles for the cDNAs derived from adult human tissues and from mouse tissues, 
10 and to 28 cycles for the cDNAs derived from fetal tissues. The amplification products 
migrated by electrophoresis on 2% agarose gels. 

5) Histological analysis of a muscle biopsy from a patient 

The histological and histo-enzymatic analyses were carried out on a muscle 
biopsy from a patient derived from an SPG4 locus-linked family according to the 
15 standard techniques described in Casari et al., 1998. 

6) Accession numbers in the public databases 

The SPG4 (or SPAST) cDNA and the deduced protein sequence, 
GenBank/EMBL AJ246001; the incomplete Spg4 cDNA clone, GenBank/EMBL 
AJ246002; the SPG4 (or SPAST) gene, GenBank/EMBL AJ246003. 

20 Example 2 : Analysis of the sequence of the SPG4 range 

The analysis of the recombination events made it possible to reduce the SPG4 
candidate region to a genetic range of 0 cM between the D2S352 and D2S2347 
markers (19, 20). A presequencing map of the SPG4 range composed of 37 BACs was 
constructed (Hazan et al., in press in Genomics); the candidate region covers a 

25 physical distance of approximately of 1.5 Mb. Twelve overlapping BACs, stretching 
over the SPG4 region, with the exception of a single 4 kb hole between clones A and 
E, were selected to be sequenced (fig. 1A). Seven of these BACs (A, B, C, D, E, F and 
G), covering approximately 70% of the region of interest, have already been 
sequenced. The sequences of these 7 BACs were compared with those of the nucleic 

30 acid and protein databases, and analyzed with four exon prediction programs. These 
preliminary sequence analyses made it possible to reveal 14 potential transcription 
units, including three corresponding to the genes encoding xanthine dehydrogenase, 
steroid 5a-reductase 2 and a TGF0-binding protein. Of the 14 genes detected by the 
sequence analysis, 9 had been previously identified in the EST (for "Expressed 

35 Sequence Tag") databases and located in the SPG4 range (Hazan et al., in press in 
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Genomics); the 5 remaining genes could only be identified by sequencing the 
candidate region. One of these 5 novel genes showed homology in 3' of its coding 
region, with the genes encoding the AAA protein family (Confalonieri et al., 1995). More 
thorough sequence analyses showed that this gene, named SPG4 (or SPAST), was 
5 composed of 17 exons and extended over a region of approximately 90 kb, covered by 
two adjacent BAC clones, D and G (cf. fig. 1B). The first three predicted exons of this 
gene were identified in BAC D, by two of the four exon prediction programs used, 
GRAIL II and GENSCAN; they show strong homology with a mouse blastocyst EST, 
AA560327. The last 14 exons are found in BAC G. The protein sequence deduced 

10 from exons 7 to 17 is significantly homologous to a subclass of the AAA family, which 
includes the Yta6p (Schnall et al., 1994), TBP6 (Schnall et al., 1994) and End 13 yeast 
proteins, and also the SKD1 mouse protein (Perier et al., 1994). 

Of the four exon prediction programs FGENEH appears to be the most reliable 
and the most powerful, enabling detection of most of the genes of this chromosomal 

15 region at 2p21-p22. This observation also applies to the SPG4 gene, for which 15 
exons could be demonstrated using this program, while only 4, 9 or 11 exons could be 
located using the Genie, GRAIL II and GENSCAN programs, respectively. The 
genomic organization of this gene (fig. 1B) could subsequently be confirmed by 
determining the sequence of the SPG4 cDNA. The intron/exon junctions are 

20 represented on table 3 hereinafter: the exon size ranges from 41 bp (exon 16) to 
1.410 kb (exon 17), that of the introns ranging from 140 bp (intron 11) to 23.247 kb 
(intron 1). 
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Example 3 : Identification of the SPG4 cDNA 

Several successive amplifications by 5' and 3' RACE-PCR were carried out on 
collections of adult liver and brain and fetal brain cDNA, in order to characterize the SPG4 
transcript. All the 5' RACE-PCRs gave amplification products terminating at nt position 
5 263 of the SPG4 cDNA (fig. 2), which was probably due to the rich GC content of the 5' 
region of the transcript (90% of GC in the 60 bp preceding nt position 263). Four 
overlapping PCR products, covering all of the coding region, were amplified from the 
cDNAs derived from the lymphoblasts of six control individuals, and entirely sequenced 
with the aim of verifying the sequence of the SPG4 transcript. Aligning the sequences of 

10 all the PCR and RACE-PCR products made it possible to reconstitute a 3263 bp 
sequence comprising a 1848 bp open reading frame preceded by a 125 bp untranslated 5' 
region (5' UTR for "5' UnTranslated Region") and followed by 1290 bp 3' UTR region 
including a polyadenylation site between nt positions 3227-3232, ~ 35 bp upstream of the 
polyA tail (fig. 2). Comparing the sequence of the SPG4 cDNA with the EST databanks 

15 made it possible to detect significant homology with 6 human ESTs, including 
EST N47973 which contains a more extended 3' noncoding region (+ 180 bp) comprising 
a second polyadenylation site. The translation initiation site was identified by the presence 
of a Kosak consensus sequence (CTGTGAatgA) defined as a "suitable context" for 
translation initiation given that a purine is located 3 nt upstream of the initiator ATG, itself 

20 preceded by a STOP codon. The 3263 bp cDNA sequence is identical to the transcribed 
sequence deduced from the 17 exons of the SPG4 gene. The analysis of the sequence of 
the 5' region using the TSSG and TSSW computer programs suggests the presence of a 
promoter sequence of the TATA box type located 43 bp upstream of nt position 1 of 
exon 1 . 

25 Example 4 : Mutations in the SPG4 gene 

Heterozygous mutations were sought in the SPG4 cDNA originating from 
lymphoblasts of 14 patients derived from SPG4 locus-linked families (1 affected individual 
per family). Four overlapping PCR fragments, I, II, III and IV, covering the open reading 
frame of the SPG4 cDNA, were amplified and sequenced in the 14 patients, and also in 6 

30 healthy control individuals. The agarose gel electrophoresis of PCR fragment IV showed 
three bands of equal intensity in 3 patients from families 2992, 5226 and 5330 originating 
from the same region of Switzerland, which would suggest a microdeletion or a mutation 
of a splice site; the two additional bands were not present in 2 healthy individuals derived 
from families 2992 and 5330 (fig. 3A). The genomic sequence of exon 16 revealed a 

35 heterozygous A->G mutation of the splice acceptor site (AG) of intron 15 in the affected 
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individuals of these three families (fig. 3B); this mutation engenders the loss of exon 16, 
followed by a reading frame shift in the abnormal transcript. None of the healthy members, 
including husbands and wives, carry this mutation of the splice site. The identification of 
the same mutation in all the affected members of these three Swiss families demonstrates 
5 the existence of a common ancestor, which had probably been suggested by the study of 
the haplotypes. 

Three point mutations, 1210C->G, 1468G->A and 1620C->T, which introduced 
amino acid substitutions into the protein sequence (S362C, C448Y and R499C), were 
respectively revealed by sequencing PCR fragments III and IV in the affected individuals 

10 of families 624, 4014 and 618. These three substitutions all involve a cysteine residue, 
inducing the loss or insertion of a cysteine in the protein sequence. A 1 bp deletion, 
1520delT, which creates the appearance of a STOP codon inducing a truncated protein 
composed of 465 amino acids (aa), was detected in the affected individuals of family A. 
None of the five mutations summarized in table 4 hereinafter was found in the control 

15 individuals tested, whether they belong to the healthy siblings or to the spouses of the 
seven families analyzed herein. These five mutations significantly affect the protein 
sequence in a very conserved domain, or AAA cassette (Beyer, 1997), which is composed 
of several protein motifs presumed to be responsible for the ATPase activity in all the 
members of the AAA family. 
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In addition to these five mutations described above, searches for heterozygous 
mutations, carried out on patients suffering from AD-HSP derived from 36 other families, 
made it possible to reveal 34 other mutations which modified or were likely to modify the 
product of expression of the SPG4 gene. 

The characteristics of these 34 other mutations are summarized in table 5 
hereinafter, into which the first five mutations mentioned above have also been inserted. 
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Table 5 



Mutations in SPG4 in the patients suffering from AD-HSP 



Family 


Location 


Mutation 3 


Amino acid change D 


Consequence 


624 

6958 

214 


exon 7 


1210 C h^G 


S362C 


missense 


i 'yii n laa 

IZjj \J f — ?/\ 


G370R 


missense 


exon 8 
exon 8 


1267Ti->G 


F381C 
N386K 


missense 
missense 


1002 


exon 8 


1283Ti->G 


KisaK 


missense 


027 


exon 8 


1288 AhG 


L426V 


missense 


019 


exon 10 


1401 C h->G 


C448Y 


missense 


4014 


exon 1 1 




K.40UJL 


missense 


148 


exon 1 1 


1 jU4 u 1— r 1 




missense 


618 


exon 13 


1620 C \->T 


r ■» c c c x 7 


missense 


636 


exon 15 


1788Gi-^A 
1792Ci->T 


AjjOV 


missense 


627 


exon 15 








exon 3 


702 C i— >T 


Q193STOP 


nonsense 


3655 


exon 5 


873 a h->T 


K229STOP 


nonsense 


1 m r\ 


exon 5 


Qf)7 c i iA 


S261STOP 


nonsense 


932 C I— >G 


Y269STOP 


nonsense 




exon 5 


R431STOP 
R431STOP 


nonsense 
nonsense 


6922 
616 


exon 10 
exon 10 


1416Ch-»T 


1416 Ch^T 


R562STOP 


nonsense 


605 


exon 1 5 


1809C(-Vr 






030 


exon 2 


578-579insA 


PTC + 2 aa 


shift + nonsense 


615 


exon 5 


852delll 


PTC+18aa 


shift + nonsense 


042 


exon 5 


882-883insA 


PTC+ 12 aa 


shift + nonsense 


032 


exon 5 


906delT 


PTC + 17 aa 


shift + nonsense 


189 


exon 9 


1 299delG 


PTC + 3 aa 


shift + nonsense 


->060 


exon 9 


i ;54Uaei j 


ric + jj aa 


shift + nonsense 


625 




1 34ftHp15 


PTC + '?<; aa 


<+iift + nnn^^rwtf 1 


A 


exon 1 1 


1520delT 


PTC + 7 aa 


shift + nonsense 


115 


exon 12 


1574delGG 


PTC + 2aa 


shift + nonsense 


3266 


exon 13 


1634del22 


PTC + 18 aa 


shift + nonsense 


149 


exon 14 


1684-1685insTT 


PTC + 9 aa 


shift + nonsense 


645 


exon 14 


1685del4 


PTC + 7 aa 


shift + nonsense 






808-2 a H^g 


? 


splice site mutation 
splice site mutation 


029 
162 


intron 4 
intron 6 


1 129+2 ti->e 


? 
? 


125 


intron 7 


1223+1 gh->t 


? 


splice site mutation 


143 


intron 8 


1299+1 gh->a 


(PTC + 6 aa) 


splice site mutation 


1620 


intron 1 1 


1538+5 gi->a 


? 


loss of exon 1 1 + shift 


1006 


intron 1 1 


1538+3 del4 


? 


splice site mutation 


1605 


intron 13 


1661+1 gh»t 


9 


splice site mutation 


1012 


intron 13 


1662-2 a h->t 


? 


splice site mutation 


1626 


intron 15 


1812+1 g^a 


A aa564 i-» aa576 (PTC+7 aa) 


splice site mutation 


2992 
5226 


intron 15 
intron 15 


1813-2 a K^g 


A aa564 M> aa576 (PTC+7 aa) 


loss of exon 16 + shift 
loss of exon 16 + shift 


5330 


intron 15 


1813-2 a h->g 


A aa564 i-^ aa576 (PTC+7 aa) 


loss of exon 1 6 + shift 


1611 


intron 16 


1813-2 a h^g 


? 


splice site mutation 






1853+1 gh^a 





3 The nt positions refer to the sequence of the SPG4 cDNA. b The aa positions refer to the spastin sequence. 
The exon bases are indicated in upper case, those of the introns in lower case. PTC+n aa - "premature 
termination codon" at n amino acids downstream of the mutation. 
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Example 5 : Analysis of the protein sequence of spastin 

The open reading frame of SPG4 encodes a 616 aa protein which we have named 
spastin and the molecular weight of which is approximately 67.2 kDaltons (kD). The 
comparison of this amino acid sequence in the protein databases, using the BLAST 
5 programs, made it possible to reveal a region of strong homology with several members of 
the AAA family, at the C-terminal end of spastin. The "typical" motifs of the AAA family, 
encompassed in the AAA cassette, are located between aa positions 342 and 599 (see 
fig. 2) according to the sequence comparisons in the ProDom and Prosite protein domain 
databases. The three conserved typical domains, including the Walker A and B motifs and 
10 also the minimum consensus motif of the AAA proteins are located in the AAA cassette at 
aa positions 382-389, 437-442 and 480-498, respectively, (fig. 2). The Walker A motif, 
"GPPGNGKT", also called p-loop, which corresponds to the ATP-binding domain, and the 
B motif, "I I FIDE", are very conserved among all the members of the AAA family, including 
spastin. 

15 The comparison of the AAA cassettes present in 150 proteins of this ATPase 

family, derived from organisms which are very far apart in evolution made it possible to 
classify this set of proteins into several subgroups, as a function of the number of AAA 
cassettes identified (1 or 2) and of the sequence homologies between these various 
cassettes (Beyer, 1997). Among all the proteins of the AAA family, spastin shows stronger 

20 homology with a particular subclass of the AAAs, and more specifically with the following 
proteins, most of which were identified through the complete sequencing of the genome of 
the organism in question: two proteins of Caenorhabditis elegans, 016299 and Q18128; 
two subunits of the 26S proteasome of Saccharomyces cerevisiae, Yta6p (Q02845) and 
TBP6 (P40328) (Schnall et al., 1994); a subunit of the proteasome of 

25 Schizosaccharomyces pombe (043078); the SAP1 (P39955) and END13 (P52917) 
proteins of S. cerevisiae and the murine SKD1 protein (P46467) (Perier et al., 1994). The 
multiple alignment of these 8 proteins with spastin is represented in fig. 4A. Of the 257 
amino acids encompassing the AAA cassette (aa positions 342-599), spastin shows 52%, 
51% and 50% sequence identity with the Yta6p (Q02845) yeast protein, the 016299 

30 nematode protein and the TBP6 (P40328) yeast protein, respectively. Similar results were 
obtained by analyzing the protein sequence of spastin in the ProDom database, which 
showed the existence of three domains of homology (named 92, 179 and 6226, and 
corresponding to aa positions 342-409, 411-509 and 512-599) found in the putative 
subunits of the 26S proteasome of yeast. In addition, the members of this AAA subgroup 

35 most commonly contain motifs of the leucine-zipper type, two of which could be detected 
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in the protein sequence of spastin at aa positions 50-78 and 508-529, by analyzing the 
sequence in the Prosite database (see fig. 2). This analysis was also able to predict the 
presence of a dimerization motif of the helix-loop-helix type, located between aa positions 
478 and 486. 

5 The comparison of the protein sequence of spastin with those of the mitochondrial 

metalloproteases, such as the AFG3, RCA1 and YME1 yeast proteins, and also 
paraplegin, which is implicated in a rare form of AR-HSP, shows that the homology 
between these five members of the AAA family is limited to the 257aa region 
encompassing the AAA cassette (fig. 4B). In this region, the sequence identity between 

10 spastin and paraplegin is only 29%, whereas paraplegin and the AFG3 yeast protein are 
57% identical over this same portion of the protein sequence. This sequence comparison 
suggests that spastin does not belong to the same AAA subgroup as paraplegin and other 
mitochondrial metalloproteases. In addition, the computer analysis of the spastin 
sequence using the PSORT II program, which makes it possible to predict the subcellular 

15 location of the proteins, appears to indicate that spastin is a nuclear protein. A possible 
nuclear localization signal (NLS), RGKKK, was revealed between aa positions 7 and 11, 
whereas no signal peptide characteristic of importation into mitochondria could be 
detected, unlike what had been observed for paraplegin. 
Example 6 : Expression profiles for SPG4 and for its murine ortholoq Spg4 

20 The comparison of the nucleic acid sequence of SPG4 in the EST databanks 

made it possible to detect several human, murine and rat ESTs showing strong homology 
with SPG4. The mouse blastocyst and E8 embryo cDNA clones corresponding to two of 
the murine ESTs, AA560327 and AA1 07866, were obtained from the IMAGE consortium 
and entirely sequenced. The assembly of the sequences of these cDNA clones made it 

25 possible to reconstitute a 1689 bp consensus sequence including a 1514 bp incomplete 
open reading frame. The comparison between the human SPG4 cDNA and this mouse 
cDNA showed that the murine transcript lacks approximately 460 bp at the 5' end, 
including the translation initiation codon. The mouse open reading frame is followed by a 
1 75 bp 3' noncoding region (3' UTR) containing a polyadenylation site located ~20 bp 

30 upstream of the polyA tail (fig. 5). The nucleic acid sequence of SPG4 and the protein 
sequence of human spastin show 89% (between nt positions 460 and 1982) and 96% 
(between aa positions 113 and 616) identity, respectively, with the mouse cDNA and 
deduced protein sequences. This considerable degree of homology makes it possible to 
affirm that this mouse transcript corresponds to the murine ortholog of SPG4, which was 

35 therefore named Spg4. 
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The hybridization of Northern blots comprising the mRNAs of various human and 
murine tissues (Clontech) with the SPG4 and Spg4 cDNA clones did not give any 
convincing results, except a very weak band corresponding to a 2.5 kb transcript in the 
mouse testicle after exposure for 10 days. Because of the low level of expression of this 
5 gene, the expression profiles for SPG4 and Spg4 were determined by PCR experiments 
on normalized collections of cDNA originating from various adult and fetal tissues (see 
fig. 6A to 6C). The murine Spg4 gene is expressed ubiquitously in the adult tissues of 
mice, and also from the E7 stage to the E17 stage of mouse embryos (fig. 6A). Higher 
expression of Spg4 was detected in the liver, skeletal muscle and testicles, and also at the 

10 E15 stage of embryos. The early expression of Spg4 during embryonic development was 
confirmed by the presence of ESTs originating from blastocyst, E8 embryo and embryonic 
carcinoma cDNA libraries in the public EST databanks. The human SPG4 gene is, itself, 
also expressed ubiquitously in adult (fig. 6B) and fetal (fig. 6C) tissues, with perhaps more 
marked expression in fetal brain. 

15 Example 7 : No oxidative phosphorylation impairment in SPG4 locus-linked AD-HSP 

In order to determine whether spastin mutations induced an oxidative 
phosphorylation (OXPHOS) impairment in mitochondria, in the same way as had been 
observed for paraplegin, a muscle biopsy was performed on a patient from one of the 
SPG4 locus-linked AD-HSP families. The morphological and histo-enzymatic analyses of 

20 this muscle biopsy did not reveal any muscle fibers of the RRF (for "ragged red fiber") 
type, characteristic of OXPHOS impairments in mitochondria. The fact that all the muscle 
fibers appear to be normal, and also the prediction of a nuclear localization for spastin, 
seem to indicate that SPG4 locus-linked AD-HSP is not a mitochondrial disease of the 
OXPHOS type, unlike SPG7 locus-linked AR-HSP. 

25 

Using a positional cloning approach based on sequencing a 1.5 Mb region, we 
have identified the SPG4 (or SPAST) gene responsible for the most common form of 
AD-HSP, previously located on chromosomal bands 2p21-p22. Thirty nine mutations 
which modify or are likely to modify the gene product, named spastin, could be detected in 

30 the affected individuals from forty one families with AD-HSP showing a link to the SPG4 
locus. Spastin is a novel member of the AAA protein family, which appears to have a 
nuclear localization and which shows strong homology with the subunits of the 26S 
proteasome of yeast. Despite great homology restricted to a domain of 230 to 250 aa, 
termed AAA cassette, the many members of this protein family can participate in very 

35 varied cellular mechanisms, such as the transport of proteins in vesicles, cell cycle 
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regulation, organelle biogenesis, i.e. control of transcription, etc. However, ali these 
cellular mechanisms involve the assembly, the functioning or the degradation of protein 
complexes, which suggest that the members of the AAA family are so-called "chaperon" 
proteins. 
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CLAIMS 

1. Purified or isolated nucleic acid of the SPG4 gene, characterized in that it 
comprises a sequence chosen from the group comprising: 

5 a) the sequence SEQ ID No. 1, the sequence SEQ ID No. 2, the sequence SEQ ID 
No. 72, the sequence SEQ ID No. 106 or the sequence of at least 15 consecutive 
nucleotides of one of these sequences; 
b) the nucleic acid sequences which are homologs or variants of the sequences SEQ ID 
No. 1, SEQ ID No. 2, SEQ ID No. 72 or SEQ ID No. 106; and 
10 c) the complementary sequence or the RNA sequence corresponding to the sequences 
as defined in a) and b). 

2. Purified or isolated nucleic acid according to claim 1, with the exception of 
the nucleic acid identified in the GenBank databank under the accession number 
AB029006. 

15 3. Purified or isolated nucleic acid according to claim 1 or 2, characterized in 

that it comprises at least one sequence of at least 15 consecutive nucleotides of the 
nt 714-809, ends inclusive, fragment of the sequence SEQ ID No. 2, of the sequence 
complementary thereto or of the sequence of the corresponding RNA thereof. 

4. Purified or isolated nucleic acid according to one of claims 1 to 3, 
20 characterized in that it comprises a mutation corresponding to a natural polymorphism in 

humans. 

5. Probe or primer, characterized in that it comprises a sequence of a nucleic 
acid according to one of claims 1 to 4. 

6. Probe or primer according to claim 5, characterized in that its sequence is 
25 chosen from the sequencs SEQ ID No. 4 to SEQ ID No. 71. 

7. Splice acceptor or donor site, characterized in that it comprises a sequence 
of a nucleic acid according to claim 1 chosen from the sequences SEQ ID No. 74 to SEQ 
ID No. 105. 

8. Method for screening cDNA or genomic DNA libraries, or for cloning 
30 isolated genomic or cDNA encoding spastin, characterized in that it uses a nucleic acid 

sequence according to one of claims 1 to 7. 

9. Method according to claim 8, for identifying the genomic or cDNA sequence 
of the SPG4 gene of mammals, in particular of mice. 

10. Method for identifying a mutation carried by the human SPG4 gene, 
35 characterized in that it uses a nucleic acid sequence according to one of claims 1 to 7. 
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11. Method according to claim 10, for identifying a mutation responsible for 
autosomal dominant hereditary spastic paraplegia. 

12. Method for identifying the nucleic acid sequences which promote and/or 
regulate the expression of the SPG4 gene, characterized in that it uses a nucleic acid 

5 sequence according to one of claims 1 to 7. 

13. Nucleic acid identified using a method according to one of claims 9 to 12. 

14. Polypeptide encoded by a nucleic acid according to one of claims 1 to 4 

and 13. 

15. Polypeptide according to claim 14, preferably with the exception of the 584 
10 amino acid peptide, the sequence of which is identified in the GenBank databank under 

the accession number AB029006. 

16. Polypeptide according to claim 14 or 15, characterized in that it comprises 
an amino acid sequence chosen from the group comprising: 

a) the sequence SEQ ID No. 3, the sequence SEQ ID No. 73, the sequence SEQ ID 
15 No. 107 or the sequence of at least 10 consecutive amino acids of one of these 

sequences; and 

b) the sequences which are homologs or variants of the sequences SEQ ID No. 3, SEQ 
ID No. 73 or SEQ ID No. 107. 

17. Polypeptide according to claim 14 or 15, characterized in that it comprises 
20 the sequence of at least 8 consecutive amino acids of the sequence of the aa 197-228, 

ends inclusive, fragment of the sequence SEQ ID No. 3. 

18. Polypeptide according to claim 14 or 15, characterized in that it comprises 
an amino acid sequence chosen from the group comprising the sequence SEQ ID No. 3, 
the sequence SEQ ID No. 73, the sequence SEQ ID No. 107, which sequences carrying 

25 at least one of the mutations corresponding to a natural polymorphism in humans, and the 
sequences of the fragments thereof of at least 10 consecutive amino acids. 

19. Cloning and/or expression vector containing a nucleic acid sequence 
according to one of claims 1 to 4, and 13. 

20. Vector according to claim 19, characterized in that it includes the elements 
30 required for its expression in a host cell. 

21 . Host cell transformed with a vector according to claim 1 9 or 20. 

22. Mammal, except a human, characterized in that it comprises a cell 
according to claim 21. 

23. Mammal, except a human, according to claim 22, comprising a transformed 
35 cell, characterized in that the sequence of at least one of the two alleles of the SPG4 gene 
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contains at least one of the mutations corresponding to a natural polymorphism in humans 
or identified using a method according to claim 10 or 11. 

24. Use of a nucleic acid sequence according to one of claims 5, 6 and 13, as a 
probe or primer, for detecting and/or amplifying nucleic acid sequences. 
5 25. Use of a nucleic acid sequence according to one of claims 1 to 7, and 13, 

for screening a genomic or cDNA library. 

26. Use of a nucleic acid sequence according to one of claims 1 to 4 and 13, 
for producing a recombinant or synthetic polypeptide. 

27. Method for producing a recombinant polypeptide, characterized in that a 
10 transformed cell according to claim 21 is cultured under conditions which allow the 

expression of said recombinant polypeptide, and in that said recombinant polypeptide is 
recovered. 

28. Polypeptide, characterized in that it is obtained using a method according 
to claim 27. 

15 29. Mono- or polyclonal antibodies or their fragments, chimeric antibodies or 

immunoconjugates, characterized in that they are capable of specifically recognizing a 

polypeptide according to one of claims 14 to 18, and 28. 

30. Method for detecting and/or purifying a polypeptide according to one of 

claims 14 to 18, and 28, characterized in that it uses an antibody according to claim 29. 
20 31. Method for genotypic diagnosis of AD-HSP associated with the SPG4 

gene, characterized in that a nucleic acid sequence according to one of claims 1 to 7 and 

13 is used. 

32. Method for genotypic diagnosis of AD-HSP associated with the presence of 
at least one mutation on a sequence of the SPG4 gene, using a biological sample from a 

25 patient, characterized in that it includes the following steps: 

a) where appropriate, isolation of the genomic DNA from the biological sample to be 
analyzed, or production of cDNA from the RNA of the biological sample; 

b) specific amplification of said DNA sequence of the SPG4 gene likely to contain a 
mutation, using primers according to either of claims 5 and 6 or a nucleic acid 

30 according to claim 13; 

c) analysis of the amplification products obtained and comparison of their sequence with 
the corresponding normal sequence of the SPG4 gene. 

33. Method for diagnosing AD-HSP associated with abnormal expression of a 
polypeptide encoded by the SPG4 gene, characterized in that one or more antibodies 

35 according to claim 29 is (are) brought into contact with the biological material to be tested, 



47 



under conditions which allow the possible formation of specific immunological complexes 
between said polypeptide and said antibody or antibodies, and in that the immunological 
complexes possibly formed are detected and/or quantified. 

34. Method for selecting a chemical or biochemical compound which is capable 
5 of interacting directly or indirectly with a polypeptide according to one of claims 14 to 18, 

and 28, or with a nucleic acid according to one of claims 1 to 7, and 13, and/or which 
makes it possible to modulate the expression or the activity of these polypeptides, 
characterized in that it comprises bringing a nucleic acid sequence according to one of 
claims 1 to 7, and 13, a polypeptide according to one of claims 14 to 18, and 28, a vector 
10 according to either of claims 19 and 20, a cell according to claim 21, a mammal according 
to either of claims 22 and 23 or an antibody according to claim 29 into contact with a 
candidate compound, and detecting a modification of the activity of said polypeptide. 

35. Use of a nucleic acid sequence according to one of claims 1 to 7, and 1 3, 
of a polypeptide according to one of claims 14 to 18, and 28, of a vector according to 

15 either of claims 19 and 20, of a cell according to claim 21, of a mammal according to 
either of claims 22 and 23 or of an antibody according to claim 29, for studying the 
expression or the activity of the SPG4 gene. 

36. Kit or pack for diagnosis, characterized in that it comprises at least one 
compound chosen from the following group of compounds: 

20 a) a nucleic acid according to either of claims 5 and 6; and 
b) an antibody according to claim 29. 
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| GCTCCTGACACCQ^CCCGCACACCGGGGTCl^TGGCCCCCGCCGTAGCAGTGGCTGCCGCCCTCGCTTGGTTCCCGTCGGTCTGCGGG AGGCGGG 95 
I TTATGGCGCCGGCCCCACTGAGAa^TGAATCAATTCTCCGGCTCCACGAGCCAAGAAGAA^ 190 
1 HHSPCGRGKKKGSGGASNPVPP 

CAGGCCTCCGCCCCCTrGCCTGGCCCCCGCCCCTCCCCXaXCGGGCCGGCCCCTC 2BS 
23 RPPPPCLAPAPPAAGPAPPPESPHKRHI.TYFS 

CCTACCO;CTCTTTGTA«KTTCreGCrGCT«CCT^ 380 
55 IPI.FVGFALLRI.VAFBI.GI,I.FVKLCOKFSRA 

CTCATGGCAGCCAAGACKAGCTCCGGGGCCGCGCCAGI^CCTGeCTC 47 5 

B€ LHAAXRSSGAAPAPASASAFA PVPGGEAERVR 

AGTCTTXXACAAA«GGCCTTCGAGTACAICTCCATTGCCCTGCGCATCGAT^ 570 
118 VFHKQAFEY IS I AIR I DED EK AlGOKEOAV E W I 

ATAAGAAAGGTATTGAAGAACTGGAAAAAGGAATAGCTGTTATAGTTACAGWCAAC^^ 665 
150 KKCIEELEKGIAVIVTGQG3EQCERARRI.QAK 

ATGATGACTAATTTGGTTATGGCCAAGGACCGCTTACAACTTCTAG^GAAGATGCAACCAGTTTTGCCATTTTCCAAGTCACAAACGGACGTCTA 7 6 0 
18lMMTHLTMARDRI.OLLE4ieMOPVLFPSKSQTDVr 

rAATGACAGTACTAACTTGGCATGCCGCAATGGACATCTCCACTCA<^AAG 8 S 5 

213 NDSTBLACRWCHLOSESSGAVPKRKDPLTHTSN 

ATTCACTGCCTCGTTC AAAAACAGTTATGAAAA CTGGATCTCCAGGC CTTTCAGGCCACCATA G AGCA CCTAGTTACAGTGGTTT ATCCATGGTT 950 
2*5 SLPRSKTVMKTGSAC LSGHRRAPSTSCLSHV 

TCTGGAGTGAAACAGGGATCTGGTCXTGCTCCTACCACTCATAACp^ACTCCGAAAA 104 5 

276 SGVK0GSGPAPTTHKG6rPKT» R T H K PSTPTTA 

TACTCGTAAGAAAAAAGACTTGAAGAATTTTAGGAATCTGGACAGCAAC^ 1140 
308 TRKKKDLKNFRNVDStII>AltLIH)IBXVDNG7TAV 

TTAAATTTGATGATATAGCTGGTCAAGACTTGGCAAAACAAGCATTGCAAGAAATTGTTA'i I' L 'i' l 'CCTTCTCTGAGGCCTGAG^rTGTTCACAGGG 1235 
340 KFDDIAGQDLAKQA1.0EIVILP5-LRPEL«FTG 

CTTAGAGCTCCTGCCAGAGGGCTGTTACTCTTTGGl^CACCTGGGAATGGGAAGACAATGCTt^SCTAAAGCAGTAGCTGCAGAATCGAATGCAAC 1330 
371 LRAPARGLLLF GPP GNCKT HLA9KAVAAESNAT 

CTTCTTTAATATAAGTGCTGCAAGTTTAACTTCAAAATACtTGGGAGAAGGAGAGAAATTGGTGAGGGCTC'l i'T'l TGCTGTGGCTCGAGAACTTC 14 25 
40 3 FFN1SAASLTSKY VlOG L GEKLVRALFAVARELQ 

AACCTTCTATAATTTTTATAG|iTGAAGTTGATAGCCTTTTGTGTGAAAGAAGAGAAGGGGAGCACGATGCTAGTAGACGCCTAAAAACTGAATTr 1520 
43S F S I I F I DLLE VDSLLCEFRECEHDASRRLKTEF 

CTAATAGAATTTGATGG^t^ACAGTCTGCTGGAGATGACAGAGTACTTGTAATGGGTGCAACTAATAGGCCACAAGAGCTTGATGAGGCTGTTCT 1615 
466 L I E F D G Vl20 S A G D D R VL VHGATKRPOELDEAVL 

CAC^CGTTTCATCAAACGGGTATATGTGTCTTTACCAAATGAGGAG^CAAGACT 1710 

TGACCCAAAAAGAACTAGCACAACTTGCTA^^TGACT^ATGGATACTCAGGAAGTGACCTAACAGCTTTGGCAAAAGATGCAGCACTGGGTCCT 1805 
53 0 TOKELAOLAR MIST DGY£GSDLTALAKDAAI>GP 

ATCCGAG|\ACTAAAACCAGAACAGGTCAAeAArA7«rCrcC^^ 1900 
561 I P E16L KPEOVKNMSASE K17R NIRl.SDFTESLKKI 

AAAACGCAGCGTCAGCCCTX^AAACTTTAGAAGCGTACATACGrTGGAACAAGGACTTTGGAGATACCACTGTTTAAGGAAATACCTTTGTAAACC 199 5 
593 KRSVSPOTLEAYXRKHKDFGDTTV* 

TGCAGAACATTTTACTTAAAAGA<^AAACACAAGATCTTCAAT^AACGTCATCGGCTACAGAAACAGCCTAAGTTTACAGGACTTTTTAGAGTCT 2090 
TACATATTTGTGCACCAAACTTGAAGATGAACCAGAAAACACACTTAAACAAAATATACAATGCAAATGTAA l ni ' ll ' Ul l GTTTAAGGCCTTGC 2185 
CTTGATGGTCACAGTTATCCCAATGGACACTAAGTTAGAGCACAACAA^ 2280 
AATTTGTATATTGTGTTGCAGATGAAAGTATTCCAGGAACAGTGAATGGTAGAAGACACAAG AACAi l iltl I ' ll. ' 11 i CTCTTCTGATGTTTTTTC 2375 
TTAAAATACTAATTTCTCCTA ll 1 1 1V11 i i\ rrACTGTTGTCTTAACTACAGGTGATTGGAATGCCAAACACTCTTAAGTTTA l 1 mil 1 1 M C 2470 
GTTTCATAAATICAGTGTGCCAAATCAAA Cl ' i r i ' a CCTAAGTAAC^ 2 56 5 

GACATTAAACAATTGTTGTGTTC ) 1 ITI AC CT ' i 1J ATTTTTCTATTACCTTGCTACCAAACAGTTTAGATAGCAATATAATAGCAAAAAAGCAAA 2660 
TATGGTAAAATAGAGAAGGTTTCAAGGTTTGAGTTACTCTGTCATATAACATGTAGATCAGTCTTCATCTGACCTCCAGTA' i i ' l 1 1 I t 1 H.T AAT 2755 
GTATTTGTCAGAAATCTGTTGTAGACTGTTAACTTCTTCCTGATGGAATTTATTTTCTGCAA 2850 
ACTGCTGTGAAAATGTTTCCAGTGCAAGAGAAGGGAAATACTAGGAACTAAGACATTTCTAATTTATTGCTTATTA C1 1 111 I AA IT TT ACAGGA 294 5 
TAATTATAAGCAAGTGGAACTACCAICTTTTATTCTTAATAATTATTAATCCCTTCAATGAT^CTTTAAAAAAACTGAA li i ' l 1 ATACATCGCAT 304 0 
ACATTTTTCTAGTTCCTTCTGCTTGCTTTATrAACTC^ 3135 
TCCATTlGTT'lGTATAAATATGCCTGGATTTTCATTATAAAAAT^TCATTXSTAGGGAGTAGAGACTCATATCATGGCCTTTTAAATATTGT AArA 3230 
AAGGCAAATAGATATTTGCCCTTAGTTTACTGG 326 3 
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Nun: I (...] 4 59 

House: 1 AGGCCGAGACCCTCCGCCTCTtCCACAAGCAGGCCTTCGAGTACATCTCCATTGCCCTGC 60 

1 1 1 1 it 1 1 1 1 1 1 1 1 1 intitum 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 i 1 1 1 1 1 1 1 f 1 1 1 1 1 

ntJMiK 460 AG<^CGAGOSCGT<X;u^GTCTTCCACAAACAGGCCITCGAGTACATCTCCATTCCCCTGC S19 
HOUJC: SI GCATCGACGACGAACAGAAAGCAGCACACAAGGAACAAGCTCTGGAATGGTATAAGAAAC 120 

IUI11I Mtll IIIMIIIfllllMlilll MHIiniilHtllllllMIII 

tloaatn: 520 GCATCGATGAG&ATGAGAAAGCAGGACAGAAGGAGCAAGCTGTCGAATGGTATAAGAAAG 579 
House: 121 CTATCGAACAACTGGAAAAAGGAATCGCTGTTATACJTTACGGGCCAACCrrGAACAGTATG 180 

(in ummmmiimi uuiuuiun u in iiiinii 1 1 11 

JJun.n: 560 GIATTGAACAACTGGAAAAAGGAATACCTCTTATAGrrACAGGACAACGTGAACACTGTG 639 
Mouse: 181 AAAGAGCTAGAC G tCT l I^AAGCCAAAATgATGACTAATTTAGTTATGCCCAAGCACCCTT 24 0 

11111111111)1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i tl I Till I! MI 1 1 1 f I 

Unman: 640 AAAGAGCT AGACGCCTTCAAGCTAAAATGATCACT AATTTGGTTATGGCCAAGGACCGCT 699 
Mouse: 241 TACAACTTCrAGACAAGCTGCAACCaGITTTCCAArrTTCCAAGTCACAGACGGACGTCT 300 

I II it I II 111 St It 11 1 1 1 1 1 ( 1 1 1 1 1 1 1 1 1 lUllllllllllil llllllllll 

Bo»Jin: 700 TACAACTTCTAGACAAGATGCAACCAGTTTnJCCArTTTCCAAGTCACAAACGGACGTCT 759 
Mouse: 301 ATAACGAGAGTAC7AACCTGACATGCCGCAATGGACATCTCCAG7CAGAAAGTCGAGCAG 360 

till II 1 1 1 1 1 1 1 1 f II IllUmUUIUUUIIIIimiUllimi t 

Bnsin: 760 ATAATGACAGTACTAACTTGGCAT^KXGCAATGGACATCTCCAGTCAGAAAGTGGAGCTG 819 
House: 361 TTCCGAAGAGGaAAGACCCCTTAACACATGCTAG^AATTCATTGCCTCGATCAAAAACTG 420 

Mil II II lllflllillllllllt tltllflllll If II 1 1 1 llllllll I 

B»»«n: B20 TTCCAAAAAGAAAAGACCCCTTAACACACACTAGTAATTCACTGCCTCGrrrCAAAAACAC 67 9 
«o«!e: 421 TCCTGAAAAGTGGCTCCCCAGGGCTCfCCGGTCACCACAGGGCGCCTAGTTGCAGTGGTT 480 

i nun mi ii mil u n h inn it ii mini mum 

Human : 860 TTATGAAAACTGGATCTGCAGGCCTTTCAGGCCACCATACXGCACCTAGTTACAGTGGTT 939 
House: 481 TGTCCATGGTTTCTCGAGCAAGACCGGGACCTGGTCCTGCAGCTACCACACATXAGGGTA 540 

i imiimiimii t n mi immm iiimi nmitm 

Euan: 940 TATCCATGGTTTC-TGGAGTGAAACAGGGATCIGGTCCTGCTCCT ACCACTCAT AAGGCTA 999 
House: 541 CTCCAAAACCAAA TAGAACCAACAAACCT7C7ACTCCCACAACTGCAGTTCGGAAAAAGA 600 

mi m mini 11 n immim n iimm m n n i 

Buun: 1000 CTCCGAAAACAAATAGGACAAATAAACCTTCTACCCCTACAACTGCTACTCGTAAGAAAA 1059 
Moose: 601 AAGACTTGAAAAATTTTACG AATCTGG ACAGCAATCTTGCT AACCTT AT AATGAATGAAA 660 

mmim iimimmmmiim miiimiimmmnm 

IWU«' 1060 AAGACTTGAAGAATTTTACCAATCTGCACAGCAACCTrGCTAACCTTATAATCAATGAAA 1119 
Mouse- 661 TTGTrGACAATGGGACAGCTGTTAAGTTTGATGACATAGCCGGGCAGGAGCTGGCAAAGC 720 

mi mum minimi mum nm 11 m h mmi i 

Buoan: 1120 TTGTGGACAATCGAACAGCTGTTAAATTTCATGATATACCrcGTCAAGACTTGGeAAAAC 1179 
Mouse. 721 AAGCGCTGCAGGAGATTGTCATCCTTCCTTCTCTGCGGCCTCAGTTGTTCACAGCCCTCA 780 

mi im ii nm n mmmm iiiiiiiiiiiiiiiiiiiin i 

Human- 1180 AAGCATTGCAACAJ^TTGTTATTCTTCCTTCTCTGAGGCCTGAGTTGTTCACAGGGCTTA 123 9 
House- 781 GAGCTCCTGCTAGAGGCTTGTTACTCTTCGGICCGCCAGCAAACGGAAAAACAATGCTGG B40 

mmim nm mmiiii mn it u 11 11 u iiimmi 

nn» 1240 GAGCTCCTGCCAGASGGCTGTTACTCTTTGCTCCACCTGGGAATGGGAAGACAATGCTGG 1299 
MciUse 84 2 CTAAAGCAGTAGCTGCAGACTCTAATGCGACCTTTTTCAACATAAGTGCTGCCAGTTT-AA 90C 

imtimiuimm u nm nm n n miimm mini 

Ilomnn 1300 CTAAAGCAGTAGCTGCAGAATGGAATGCAACCTTCTTTAATATAAGTGCTGCAAGTTTAA 1359 
Mouse 901 CTTCAAAATATGTGGGAGAAGGAGAGAAATTGGTGAGAGCTCTCTTTGCTGTCGCTCCAG 960 

mmim iiiiiiiiiiiiiiiiiiiiiiiiii nm iiiiiiiiiiniiii 

Burner. 1360 CTTCAAAATACGTGGCAGAAGGAGAGAAATTCGTGAGGCCTCTTTTTCCTGTGGCTCGAG 1419 
MOUS" 961 AACTTCAACCATCTATAA'M X " ] 1 ATAGATGAAGTTGACAGTCTTTTGTGTGAGAGACCGC 1020 

ummn miiimimiimmiim u mnmm in i i 

nunuin 1«20 AACTTCAACCTTCTATAATTTTTATAGATGAAGTTCATAGCCTTTTGTGTCAAAGAACAC 1479 
HOUW 102] AAGGGGAGCACGACGCTAGCAGACKCCTAAAGACGGAATTTTTAATAGAATTTGACGCGG 1080 

iiiiiiiiiiiii inn 1 1 1 e i mn ii linn miiimim n i 

Hunan: H80 AAGGGGAGCACGATGCTAGTAGACGCCTAAAAACTGAATTTCTAArACAATTTGATGGTG 1539 
House 1081 TGCAATCTCCTGGAGATGACAGAGTACrTGTAATGGGTGCAACTAACAGGCCCCAAGAGC 1140 

i ii mimimimumiiiiiiiiiimimmi nm mini 

BQ»an: 1S40 TACAGTCTGCTGGAGArGACAGAGTACT7-G7AATGGGTGCAACTAATAGGCCACAAGAGC 1599 
Mouse 1141 TTGATGAAGCTGI1CTCACGCGTTTCAITAAACGGGTATATGTGTCCTTACCAAATGAGG 1200 

mini mummiiimm nummmmi miimnm 

Ru»an. 1600 TTGATGAGGCTGTTCTCAGCCGTTTCATCAAACGGGTATATGTGTCTTTACCAAATCAGG 16S9 
Mouse 1201 AGACAAGACTCCTTCTGCTTAAAAACCTGTTGTGTAAACAAGGAAGTCCACTGACCCAAA 1260 

mmim m iimmH nm iimmimiimi imimi 

nu»an: 1660 AGACAAGACtACTTTTGCTTAAAAATCTGTTATGTAAACAAGGAAGTCCATTGACCCAAA 1719 
Mouse- 3 261 AAGAACTCGCACAGCTTGCTAGAATGACCGATGGATACrCTGGAAGTGATCTGACCGCrr 1320 

mmi mn mmmmii imimiii iiiiim n n mi 

Ho»an- 1720 AAGAACTAGCACAACTTGCTAGAATGACTGATGGATACTCAGGAAGTCACCTAACACCTT 1779 
Mouse: 1321 TGGCCAAGGATGCAGCCCTGGCTCCTATCCGAGAACTGAAGCCAGAGCAGGTGAAGAATA 1380 

mi it iiiiiiii mmmmiimiii u mn iiumnmi 

Bunan- 1780 TGGCAAAAGATGCAGCACTGGGTCCTATCCGAGAACTAAAACCAGAACAGGTGAAGAATA 1839 
Mouse: 1381 TGTCTGCCAGTGAGATGAGAAATATTCGATTATCTGACTTCACAGAATCCTTAAAAAAGA 1440 

mmiimimmiimmmiimimmu nmm mn i 

Human: 1840 TGTCTGCCAGTGACATGAGAAAIATTCGATrATCTGACTTCACTGAATCCTTGAAAAAAA 1899 
Moose: 14 41 TAAAACGCAGTGTGAGTCCTCAGACCTTAGAAGCATACATACGC7GGAACAAGGATTTTG 15O0 

mmim u n mn it mum mum iimuim 1111 

1900 TAAAACGCAGCGTCAGCCCTCAAACTTTAGAAGCGTACATACGT7GGAACAAGCACTTTG 19S9 
PWvf: 1501 CAGACACCACTGTTTAAAGGAAT 1523 

mi immimi i in 

Hunan. 1960 CAGATACCACTGTTTAAGGAAAT ]982 



I 



I 1263 



I GGATGCCTCTGTGAGCCCATAGAACATCGCACTTCACAGGAAACAAGAGCTTTGCCTACA 1583 
I GGAACCCAGACTTCGTTTACAGGACGTTTTACAGTTTTCATTTTTGTGCACCAAACTTGA 164 3 
: AGAGGAACAAGAAGACAGACCTAAATAAAATATGCAATATGAATGG i6H? 
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ctggcaaaca 
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aaaattagcc 
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gtaatcccag 
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cacctgtagt 
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tccttctcaa 


aaataaataa 


ataaataaat 
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ataaataaaa 


ttttgagctg 


ggcatgaaag 


ctgaggcagg 


aggatccctt 


gagcccagca 
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gtttgagacc 


ccagtgagct 


ataattctga 


cactgcactt 


cagcctggct 


gacagaggga 


1140 
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gaccgtgtat 

tagatttcac 

atgttaagct 

tcacgcctat 

gtttgagatc 

gctgggatat 

ccacttgagc 

ctggacgaca 

agtaacagct 

attcacatca 

tatatgcctg 

atcttattaa 

ttcatagaac 

ctaatcattc 

ttgttacttc 

ggttttctgc 

tttttttttt 

ccacacttga 

tagggccgaa 

gttaaagtga 

ttttgtcctt 

caagctgaaa 

catctaggca 

ttttcttctt 

ctgagctcaa 

ggcactgttt 

acaacagtct 

aaaaaatgag 

aaacaaaatg 



ctaaaaagaa 

ataaaaattt 

tcacattcct 

aaacccagtg 

agcctggcca 

ggtggtgtgc 

cctggaggtt 

gagagacatt 

gtcctgttca 

tacatgcatt 

accactgttg 

tcctgacacc 

aaatgaatga 

acaattatgt 

tctcatccct 

ctttttcaaa 

ttttactaat 

aatctagaat 

atatccaaaa 

taaagtgacc 

gtgaaactgt 

actgggaggc 

cattgcagaa 

tttttttaga 

gtgatcctcc 

tttttgtttt 

tgagactgag 

attccagtag 

tattgagctc 



taaaataaca 

agaattctgg 

gaaaggcaaa 

ctttgggagg 

acacagtcaa 

acttgtggtc 

gagggggcag 

gtctcaaaaa 

attacaggat 

tttgcatgcc 

ctattggaag 

ccacttattg 

ataatatgtg 

ttttccttct 

cccctccaac 

atcagccatt 

ttttttagtt 

ctctcgaatt 

aaaactattt 

gaatgtcctg 

ctcagattcc 

tctcacactg 

ccagggtaac 

gacagggtct 

tgagtagctg 

ttttgttttt 

atataattcc 

agtcagaaat 

tgtcatgttg 



atgatttttg 

tttctcttga 

aatcagtgga 

ctagggtaag 

accccatctc 

ccaactactc 

tgagccatga 

gaaaaaaaaa 

gcaactcttt 

acacaaccca 

ttttggccac 

cctgatatat 

ccacattgtg 

taatacagag 

catatctttt 

tcctcactgg 

gaaaagaggt 

gaaggtctga 

gatggtaggc 

gattagttag 

attcaagatt 

taggtagaat 

accaaggcat 

cattatgtta 

ggacttcagg 

tttgacacaa 

aaggagcaga 

ctgaaatggc 

caggcatcat 



agccaataac 

aaaattaaaa 

agctagctgg 

tggattggtt 

tacaaaacat 

aggaggctga 

ttgtgccact 

aatcagctca 

agcttctcat 

cacatggaac 

tgcattaaac 

tgtccgtttc 

gactcaattc 

attggataat 

catttgtttt 

actctacatg 

ccttaatatc 

agagttcctt 

actgtggtaa 

cacagtacct 

aagtgtcctg 

ggctagcagg 

tatttttttt 

cccaggctgg 

tgtgcaccgc 

atttaggaag 

agatgtgagt 

attacagata 

gatggaggtt 



tcttagccaa 

aaatctgaca 

gtgctgtggc 

gagtccagga 

acaaaaatta 

ggcaggagga 

gcactccagc 

gtgggagctg 

agtttccatc 

ccatatatgt 

tataaactcc 

ttaatatcta 

agggagatga 

aattccccaa 

tcttatttta 

tgccattttt 

tgtcattggt 

agaaggcaag 

ataaatatgg 

agctccttct 

aaagttctta 

gttgggatct 

ctttctttac 

attccaattc 

tgtgtctgaa 

atgttaattc 

ttagagcact 

taagagacaa 

ttagatgtac 



1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 
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tctttcattt 
ctctaaatcc 
atattaagta 
aacataatgc 
tttcataatt 
taccttaaat 
agactcttag 
acatgtaatc 
agtttgagat 
tagccaaaca 
atcgcttgag 
actctagtct 
gcttaactga 
tatgggttcg 
ctctccttaa 
aagctggagt 
gattctcctg 
cagctaattt 
tgaactcctg 
cgttagccac 
gttggaaaat 
aaattctaca 
tatcctcata 
gcttattttc 
tagtacaaat 
ctcacactct 
atatgtccca 
atacagccat 



tgtaattttt 
catattttta 
caaccagata 
ccatgaatga 
aaaatagaaa 
tactgaggca 
agccccaaat 
ccagcactgg 
cagcctgggc 
tggtggtgtg 
cccgagaggt 
gggtgacaga 
agtagcaata 
ataaatatta 
acttaagcat 
tcagtggtgt 
tctcaccctc 
ttcgtatttt 
acctcaagtg 
agcatccagc 
gctgtcttaa 
cataatcatt 
tatatgcccc 
ccatatattg 
gacactgtgc 
cttggaaacc 
gctaatagcc 
tcggtcaagc 



atagaggaat 
ccatacaaaa 
gcagagactc 
aagcccatca 
cagactatgt 
gtaagtgtaa 
tctttatttt 
ggatgccaaa 
aacacggtta 
cacctgtgtt 
tgagggtgca 
gtgagactct 
ttttaaaaag 
gcaagtagta 
gttttttgtt 
gatcttggct 
ccaagtagct 
tggtagaggt 
atccacctgt 
cttaagcatg 
atgagatgct 
gtgctaaatt 
tttgcaatgt 
cactagagtt 
aactttggat 
agatgcaatg 
acaatcaacc 
catcagatga 



taactagaat 
aaagagcaaa 
agtaaatggg 
cttgcgcttc 
aaaaatatta 
ttaactaata 
aaaaaactga 
gcaggcagat 
agacctcatc 
cctggctact 
gtgagccatg 
gtcttggggg 
gcactaaaag 
gtagtcatca 
tttttgagac 
cactccaacc 
gggaccacag 
ggggtttcac 
ctcggcctcc 
ttaattaagt 
taagctgccg 
acttgcaaag 
gactttgcta 
ggccttctga 
tttaggtttc 
taaagaagtc 
tctgaacata 
ctacatccac 



agcaacccca 
agtgcagaaa 
aggccggagg 
aggggctaac 
ttcttgagat 
tgtgatgttg 
ggccagatga 
cacttgagct 
tctaaacaaa 
caggaggctg 
atcttgccat 
aaaacaaaag 
ttcatctgct 
tcatcactgt 
agtgtctcac 
tctgcctccc 
acacgtacca 
catgttggcc 
caaagtgctg 
ttttataatt 
tctgaacatg 
atggccacaa 
cttctctatc 
cttgctttga 
gagagaactt 
agggctatcc 
tgaatgaggc 
aggaatgatc 



gtcccactaa 
agcacagtca 
cccgaaaccc 
aatatactta 
cccagatttt 
ggcaaataac 
ggtggcttac 
tgagcttagg 
atacaaaaat 
aggtggaagg 
tgcactctac 
agatgataat 
tagttcagaa 
cactgctgtt 
tctgtcaccc 
aggttcaagt 
caaccacacc 
aggctggtct 
ggattacagg 
cagcaaaatg 
aggtagaagg 
caattcctcc 
aagatgtgga 
caatggaatg 
acaccttcca 
tgctagagac 
tagctaggcc 
cacaggcaag 



2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
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gccatcagaa 


gaaccatcca 


gctgaactta 


ccccaaattg 


ctgagtcaca 


aagttgtgtg 


4620 


taaataaatg 


tctgctatct 


taagccagtg 


agttttggag 


tggtatatta 


catagcatca 


4 680 


gaaatctaac 


acaatcatta 


tgtttgaatc 


atttttcaaa 


tttctcatat 


ttattaaatg 


4740 


agtaccataa 


gcaaggtgtc 


aggctggatg 


caaaaagtga 


ggcaaaatgt 


ataaagtgtg 


4800 


accactgcct 


tcagtaagtt 


tacaatctat 


atcaagaggt 


gatgaagtgt 


ttaaataatc 


4860 


atcctgcagg 


gcaatatagt 


ataagagcca 


cagagtaaca 


caaccatatt 


gtcataacaa 


4920 


ctgaaaaaca 


agatcatttc 


tgctggaggt 


gataatggaa 


taatttatca 


agaatataac 


4980 


agagctggac 


gcggtggctc 


acacctgtaa 


tcccagcact 


ttgggaggcc 


aaggaaggtg 


5040 


gatcacaagg 


tcaggagttc 


gagaccatcc 


tggctaacac 


gatgaacccg 


tctctactaa 


5100 


aaatacaaga 


aattagccgg 


gcgtggtggc 


acgcgtctgt 


agtcgcagct 


actcaggagg 


5160 


ctgaggcagg 


agaaccactt 


gaacatggga 


agcagaggtt 


gcattgagct 


gagatcgtgc 


5220 


catggcactc 


cagcctgggt 


gacagagtga 


gactcagtct 


caaaaaaaaa 


aaaaaaaaaa 


5280 


aaaatataac 


attagaggta 


agtcttgaag 


gactttgaca 


gtggaagtag 


gaggcgaggc 


5340 


cattctaagt 


gaatgaaaaa 


tgacaggaga 


gtaattgtag 


tcctggaaaa 


gagcaaagta 


5400 


ggtacagacc 


aacagtctat 


attagctaga 


gtatagtgaa 


agtgcagagg 


aaatgtcgga 


5460 


gaaccattct 


ttattcaaaa 


actatcttcc 


tcatggccaa 


gcatagtggc 


tcatgcctgt 


5520 


aatcccagca 


ttttgggagg 


tcaaggtgag 


tggatcactt 


gagctcagga 


attcaagacc 


5580 


atctggggca 


acatagtgag 


acctcatctc 


aactaaaaaa 


caaaaaattc 


agacagatgc 


5640 


agtggctcac 


acctgtaatc 


ccagaacttt 


ggtaggctga 


ggcgggcgga 


tcacgaggtc 


5700 


aggagatcaa 


gaccctcctg 


gacaacatgg 


agaaacccca 


tctctattaa 


aaatacaaaa 


5760 


ttagctgggc 


atggtggcac 


atccctgtaa 


tcccagctac 


tcgggaggct 


gaggcaggag 


5820 


aatcgcttga 


accagggagt 


cggaggttgc 


agtgagccga 


gatcgcacca 


ctgcactcca 


5880 


gtctggcgac 


agagcgagac 


tccatcttaa 


aaaataaata 


aattttaaaa 


aaaactaccc 


5940 


cagcatggtg 


gtgcatgcct 


gtagtcccag 


ttactcagga 


ggctgaggca 


agagggtggt 


6000 


ttgagccagg 


gaggtcaagg 


ctgcagtgag 


ctctgatggc 


gccactgtac 


tccagcttgg 


6060 


gtgacagagt 


gagaccttgt 


ctcaaaaaca 


aaaacaaaaa 


caaaaaacca 


acaaatctcc 


6120 


ttgttagtat 


catggtgagt 


aaaaaataaa 


ataaaaatag 


aaataaactg 


aacatggtgg 


6180 


ctcatgcctg 


taatcctagc 


actttggaag 


gctgaagtgg 


gaggattgct 


tgagggctgg 


6240 


agttcaaaac 


tggcttgggc 


aacacggtga 


gagagacctt 


gtctctacaa 


aagaactttt 


6300 
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aaaacaaaaa 


atagataatt 


taaaaaaatt 


aaaaaaaaca 


aaaaataaaa 


aaataatcaa 


6360 


gtatcaactt 


gattccaggc 


actgcttact 


actctagtgt 


tatactgtag 


atgtggaagc 


6420 


tgagtaactc 


atccaagatc 


accgaaagtg 


atggaacaca 


gatctaaatg 


caaccagtct 


6480 


gactccagga 


ccatttaacc 


attctactat 


tgggccctat 


cttggctaag 


ttagaaagta 


6540 


agttactttc 


tttagtggta 


aagactggag 


ggataacagg 


gaagatagtt 


atttaagaaa 


6600 


aaaaactggc 


atcaaactaa 


atatccatca 


atagttgaac 


agtaaaatag 


gttgtggtaa 


6660 


attcatataa 


tggaatacta 


tatagcagtg 


aaaatgtacc 


acagttatag 


aaatcaacag 


6720 


ggaggaattt 


caacacttaa 


ttattaagta 


ggtagccagg 


catagcggtt 


tatgcctgta 


6780 


atcccagcac 


tttgggagac 


caagacagga 


ggattacttg 


agcccagggg 


ttcgagatca 


6840 


acctgggcaa 


cagtgagact 


ccatctctat 


tttcttaaaa 


taaaataaat 


gaaattttaa 


6900 


aaattttgag 


gagggaaagc 


aaacaaggga 


tacttgaaat 


atgattacat 


ttccataaag 


6960 


tcaaagtgag 


gcaaaatcat 


acaagacatt 


gtttagaaat 


acataaatac 


actgcaaact 


7020 


aaaaatgaga 


cactagaatg 


attaatataa 


aattcaggat 


agtggcttcc 


tctagaggaa 


7080 


gagacaagac 


attgagatta 


gggaggagct 


cacagagtgc 


ttcgaggagt 


tggttacatt 


7140 


catttttctt 


aaatggaatg 


ctgcttatta 


tttttcttta 


aattgtgcat 


ttaagtaaca 


7200 


cacttcttgt 


ttatatgata 


tatgtataaa 


tgtaattttt 


ttttttgaga 


tggagtttcg 


7260 


ctcttgttgc 


ccaggctgga 


gtgcaatggc 


actatcttgg 


ctcactgcaa 


cctccacttc 


7320 


ctgggttcaa 


gtgattctcc 


tgcctcagcc 


tcccgagtag 


ctgggattac 


aggcatgcgc 


7380 


caccatgccc 


ggctagtttt 


gtatttttaa 


tagagaaagg 


gtttctccat 


gttggtcagg 


7440 


ctggtctcga 


actcccgacc 


tcaggtgatc 


cgcctgcctt 


ggcctcccaa 


agtgttggga 


7500 


ttacaggtgt 


gagccaccgt 


gccaggccct 


gaatcagatt 


taaaagaggg 


catttcatta 


7560 


aaaaaaattt 


tttgttgttt 


gcttttgaga 


cagagtctcg 


ctctgtcgcc 


caggctgcag 


7620 


tgcattggca 


tgatcttggc 


tcaccgcggc 


ctcagcctcc 


caggttcaag 


tgatt ctcct 


7680 


gcctcagcct 


cgcactagtt 


gagattacag 


gaatgcacca 


ccaccacagg 


aatgcacctg 


7740 


tctaactttt 


gtatttttag 


tatagaggga 


gttttgccat 


gttagccagg 


ctgctcttga 


7800 


actcctgacc 


tccggtgatc 


tgctcgcctc 


ggctcccaaa 


gttctgggat 


tacaggcgtg 


7860 


agccaccaca 


cccggccgaa 


agagggcatt 


tcagaatgag 


ggtctagcat 


aagcacagag 


7920 


aagggggagc 


aataagaggg 


aaacagggag 


taggtcattt 


ttgcaatagc 


ctgtgacatt 


7980 



tgtagggcag 


tactggcggg 


gaataattaa 


gtaaaattgg 


ctggtgctgt 


ggctcatgcc 


8040 


tgtaatccca 


gcactttggg 


aggccgaggc 


gggcaggttg 


cttgagccca 


ggaattcaag 


8100 


accaacctgg 


gaaacatagc 


aagaccctgt 


ctcaacaaaa 


aagtaaaaaa 


attagctggg 


8160 


ggcgcgatgg 


ggtggctcat 


gcctgtaatc 


ccaacacttt 


ggaaggctga 


ggcaggcgga 


8220 


ttgcttgagc 


ccaggagttg 


gagaccagcc 


tgggcaacat 


ggtgaaaccc 


tggctctata 


8280 


aagaatacaa 


aaattagtcg 


ggcccagtgg 


cgtgtgcctg 


tgatcccagc 


tactcgggag 


8340 


gctgaggtgg 


aaggatcacc 


tgagccaggg 


aggtggaggt 


tgcagtgagt 


catgttgttt 


8400 


gcgccactgc 


actccagcct 


gggcaatgga 


gtgaaaccct 


gtccaaaaaa 


taaaaaaata 


8460 


aagctgtggc 


agaatgtgga 


gattcttgga 


agctggaagc 


tctcatgggg 


catttggaaa 


8520 


cctcacattg 


taaataacgg 


agtcttttta 


tcagtttggc 


ttccttagtt 


ttaggaaaca 


8580 


agaaataatt 


atggctaact 


caagtaaaaa 


gagaaagaga 


agagaaaaaa 


gacgtggaga 


8640 


tagagagaga 


gggagagaga 


ggaaaagacg 


aaaggaagga 


agggggaaag 


gagagaggaa 


8700 


gagagaaaca 


gagaaacaga 


ctgattagtg 


tattggatag 


attacataac 


caagtgacca 


8760 


gtcaggaacc 


cagcagctct 


gggggagctc 


aatgtgatgc 


attgataaac 


ccgctcttaa 


8820 


gagcactcgt 


ttccagttac 


tttctattcg 


gtgggtctcc 


agccaagatt 


ccaggtccca 


8880 


ggagaatctg 


actgacctag 


tgtttgcttc 


cgcctttgcg 


gtctgggttc 


tgtgcttgca 


8940 


gctcattaga 


atacagggag 


cagagacaag 


caggtagttt 


cccaaaggaa 


gggatgctga 


9000 


gtagattaaa 


aaaaaagtgt 


agattcttca 


gtaaactatg 


ggatggtaac 


tatgcaaaac 


9060 


ctaagatttc 


ccttattcaa 


ataaattatc 


tttcatatta 


gacatctaaa 


tatgcactaa 


9120 


tttagttaaa 


cccctgggtt 


agttgatctc 


atcacactga 


gctaacattt 


ttgttgctgt 


9180 


tgtttgcagt 


gacctgaagt 


ttcttatctt 


cacaattgct 


ttcctctcaa 


ataattccca 


9240 


gattttaaat 


ttttatttta 


ttttttctgg 


agacggagtc 


tcgctctgtc 


gcccaggctg 


9300 


gagtgcagtg 


gcgcgatctc 


agctcacttg 


cagcctctgc 


ctcccgagtt 


caagcgattc 


9360 


tccggcctta 


gccttccaac 


cagctgggac 


tacaggcgcg 


cgcccccacg 


cccggctaat 


9420 


ttaattccca 


gattgatatc 


cattgcttct 


gagatgggcc 


aattatcctt 


cggagaagac 


9480 


ttaggtcgcc 


tggcagaaaa 


agatgaaaga 


aatctaagaa 


aacgacgaca 


ctgagagagg 


9540 


agcctagcga 


accagcagag 


cgaccccaag 


ccgcaattcc 


cccttccgtg 


gatcgattac 


9600 


gaaggcttcc 


tggcaggagc 


tctccagggc 


tgccgacgtg 


agccgaactg 


cacattggga 


9660 


actgtagttg 


agtgggaaag 


ccgagaggcg 


ggggccgcac 


acgcgtacag 


gggccccggt 


9720 
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caacaaagac 
gcggccgccg 
gagccaccga 
ggggccggcg 
ctgtggcccc 
ggcgggttat 
aagaagaaag 
ctggcccccg 
aacctgtact 
ttccacctgg 
gccaagagga 
ccgggcggcg 
attgccctgc 
ggcgccggga 
cgggagggga 
cggagcctca 
actttgtttc 
acccagacgt 
cactgtgaag 
gaagcagtgc 
gaacaaatgg 
attttaatca 
ttatatttag 
aagtagcctt 
accactttac 
acatctttcg 
gtcgcccaag 
ttccagcgat 



gcgccgtgcg 
ctgggagcca 
ctgcaggagg 
ggcagcgtgc 
cgccgtagca 
ggcggcggcg 
gctccggcgg 
cccctcccgc 
atttctccta 
ggctcctctt 
gctccggggc 
aggccgagcg 
gcatcgatga 
agaaggcggt 
cggtgcaccc 
tcttctagta 
agacaccagc 
gttgatgaca 
agtgtgcagc 
cgccttactg 
tgacaatttt 
agataatcat 
cataaaactt 
taatttccaa 
agtaaaacct 
ggtttctttt 
ctggagtgca 
tctcctgcct 



cgcgcgcgcc 
ccaggcggcg 
agaaggggtt 
ggcagtgcgg 
gtggctgccg 
gcagtgagag 
cgccagcaac 
cgccgggccg 
cccgctgttt 
cgtgtggctc 
cgcgccagca 
cgtccgagtc 
ggatgagaaa 
ggggtcgccg 
ccggaattga 
ttcttaaaac 
cttcccccac 
gtgacatttg 
ttcctctgaa 
gcttttaatg 
gaaagaaata 
tttatcagta 
ttccccctgt 
gttgaaatgt 
aaaacaacta 
aagctaactt 
atggtgcagt 
cagcctcccg 



ggagaaaaac 
gagaggacag 
gtgctcctgg 
agctcctgag 
ccgtcgcttg 
ctgtgaatga 
ccggtgcctc 
gcccctccgc 
gtaggcttcg 
tgccagcgct 
cctgcctcgg 
ttccacaaac 
ggtaactagg 
ggggagggca 
tatgccccgg 
ctctcccctt 
acttctgcat 
tcctagagtg 
ccaaggtttc 
aaagcagagt 
gctgcatatg 
caacgattcc 
tgctttgatt 
ttgatgaatg 
tgtatgtttc 
tttttttttt 
ctcggctcac 
agtagctggg 



acgggaagac 
cgacaggaag 
ccgaggaagg 
accggcgggc 
gttcccgtcg 
attctccggg 
ccaggcctcc 
ccgagtcgcc 
cgctgctgcg 
tctcccgcgc 
cctcggcccc 
aggccttcga 
gggctggggg 
acacctgcgt 
gagactgctt 
tcagggcact 
gacccaggtc 
accacactga 
caaaaggttt 
attgtagtgt 
actgcagttg 
tgaatacttt 
ttaattaaaa 
gattgcgtaa 
tgaatgaaag 
ttgagacgga 
tgcaacctcc 
attacaggca 



gtgcgcgtgc 
ggaggggccc 
agaaaggggc 
acacgggggt 
gtctgcggga 
tggacgaggg 
gcccccttgc 
gcataagcgg 
tttggtcgcc 
cctcatggca 
ggcgccggtg 
gtacatctcc 
agggggcggc 
cccttttctg 
tcccgtaggt 
gtagctgtcg 
actatgagac 
tcctttctag 
tgatattgaa 
cagaaaaaaa 
aatttgcatc 
ttcaatgaag 
cagtattcca 
acttaaacat 
caaggatact 
atctcgccct 
gactccctgg 
cgcaccacca 



9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
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cgcctggcta 
tctccatctc 
gcatgagcca 
gctattagct 
ccatgaaaag 
cagcatttta 
tgctaataat 
caattctccc 
aaacacccac 
aacagaaaca 
ttctagaatt 
ggatataagg 
agagaggtca 
ttgttaactg 
cagactggag 
cgattctctt 
gctaattttt 
aactcctgac 
cagccactgt 
cccttgggga 
gatcattttt 
ggggcgggtg 
catctctact 
tactcaggag 
cgagatcgtg 
aaagtaggtc 
ggctgaggtg 
aaaccctgtc 
cccagctact 



atttttgtat 
ctgacctcgt 
ccgcgcctgg 
ttgcatgtgt 
tgttatttaa 
gagatgaaga 
gaaatttctt 
cttttatcct 
ttttctacca 
atctgcactt 
tcaatcttgt 
aaagaggcag 
aggagggtgt 
ctccattctt 
tgcagtggcg 
gcctcagtct 
tgtattttta 
ctcaggtgat 
gcccacccag 
gaattaaaac 
tgggccagtc 
gatcatgagg 
aaaaatacaa 
gctgaggcaa 
ccactacact 
atttttggct 
ggtggattgc 
tttgtgaaaa 
tggggggctg 



ttttagtaga 
gatcgggtcg 
ccttaagcta 
tatctttttt 
tcctcacaat 
aaatgaggcc 
ttatggaaaa 
tacttccagt 
cacctcaatt 
cattcatagt 
tgaaacctgt 
ttattcattg 
taacatttaa 
tttttttttt 
tgatctcagc 
cccgagtagc 
gtagagatgg 
tcgcctgcct 
caagctccat 
ggttgcaaag 
atggtggctc 
tcaagagttt 
aaattagctg 
gagaagtgct 
ccagcctggg 
gggcacggtg 
ttgagcccag 
atacaaagat 
aggtaggagg 



gtagggattt 
cctcggcctc 
acatttttta 
ctttttaaaa 
aactttgtga 
caaaagataa 
taagtgaaat 
atgctgagat 
agatactcac 
gtctgtctct 
ttcctttgtg 
ttttggacag 
gaatactatg 
ttttttgaga 
ttactgcaac 
tggaaccaca 
tgtttcacca 
cgaccttcca 
tctttatcac 
ttttagaata 
acgcgtgtaa 
gagaccatcc 
ggcatggtga 
tgaacctgga 
cggcagagca 
gctcatgcct 
gagtttgaga 
tagctaggcg 
atcacttgag 



caccatgttg 
ccaaagtgct 
ttatatgtgc 
aaaatagcaa 
gatgaaggta 
aggaggttat 
taggagaagt 
cttgcttctc 
ttgcattgtc 
actgccaact 
gggcctggga 
taaggaaaga 
tgtttgtaga 
tggagtctcc 
ctctgcctcc 
ggtgcgccct 
tgttggccag 
agtgcttgga 
ctcttaagaa 
gaggaacatg 
tcccaacact 
tggccaacat 
cacatgcctg 
aggtggaggt 
agactccgtc 
gtaattccag 
ccagcctggg 
cagtggcaaa 
ctcaggttgt 



gccaggatgg 
ggtaatacag 
caggcattgt 
ccatcccaga 
ttattggtat 
tccaaacctg 
ctaactttta 
cctctgccaa 
cattagtgaa 
ccaaaaactg 
gtgggaagtg 
gtgacggtta 
aggaaatttt 
ctgtgtcacc 
cagacgcagg 
atcacggctg 
actggtcacg 
ttacaggtgt 
catccaggat 
tttaagcgta 
ttgggaggcc 
ggtgaaatcc 
tagtcccagc 
tgcagtgagc 
ttgaaaaaaa 
cactttggga 
caacatagtg 
tgcctgtagt 
tcaggctgca 



11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
12240 
12300 
12360 
12420 
12480 
12540 
12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
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atgagctgag 
aaaaaaaaaa 
aaagtcagaa 
cagttgccca 
gattcaagcg 
cacgcccagc 
ggtttcgagc 
acaggtgtga 
ggagttggga 
accctgagct 
agaattcaga 
tatatctatt 
atataaaaat 
gagtggaatg 
ctcttttttt 
tcaacaatat 
ttaagtttac 
tctgttcaca 
tgagacagag 
gctagctccg 
ctacaggcgc 
cactgtgtta 
ccaaagtgct 
gtggacatca 
tcttcaccca 
ggttcaagcg 
cccacaccca 
ctgttctcga 



atcgtgcaac 
aaaaaaacca 
catgcaagga 
ggctggagtg 
attctcctgc 
taatttttgt 
tcctgacctc 
gccgcggtga 
ggtaggataa 
cttagatttg 
cctcaggtct 
atacttaaat 
aattatttaa 
tgatcatagc 
tttttttgct 
tttgaacaat 
gccattctcc 
actttttctt 
cctcgctttg 
cctcccgggt 
ccgccaccat 
gccaggatgg 
gggattacag 
gaatgggctg 
ggctggagtg 
attctcctgc 
gccaattttt 
actcctggcc 



tgcactccga 
aaaaaaaaaa 
aatttttttt 
caatggtatg 
ctcagcctcc 
gtttttagta 
aagtgaaccg 
ccgaccacaa 
aaggagaaat 
aagaaatagc 
caggctggtg 
cctcctaaac 
tttattttag 
tcactgcagc 
agagatagga 
aaaaaaaaaa 
tgcgtaatgc 
tatattttga 
tcgcccaggc 
tcacgccatt 
gcccagctaa 
tctcgatctc 
gcgtgagcca 
ttgacagctc 
caatggcgcg 
ctcagcctcc 
gtatttttag 
tcaagtgatc 



actgggtgac 
aactgtttta 
atttgtttat 
atcttggctc 
tgagcagctg 
gagatggggt 
ccctccttgg 
ggaaatttta 
taaggaaaac 
agttccatgt 
acttaaatct 
atttttattt 
agacagtgtt 
ctcaaggctc 
tcttgttatg 
taaattaggt 
atatttcata 
tttcttttct 
tagagtgcag 
ctcctgcctc 
ttttttgtat 
ctgacctcgt 
ccgcgcccgg 
tttttttttt 
atctcagctc 
ccagtaactg 
tagagacagc 
cactcgcctt 



aggagtaaaa 
attgttttat 
ttttgagacg 
actgcaacct 
ggattacagg 
tccaccatgt 
cctcccaaag 
gttaacactg 
ctaggcatga 
gaggaataag 
ttcagtatca 
ttcagttgga 
tcactctctt 
attcctaggc 
ttagccaggc 
tttattgtaa 
ctcttcctac 
cttttctttt 
tggcgcgatc 
agcttcccga 
ttttagtaga 
gatccacccg 
cacatatttt 
ttttgaggca 
actgcaacct 
ggattacagg 
atttcaccat 
ggcctcacaa 



ctgtctcaaa 
ttaggaagag 
gagtctcgct 
ctgcctcccg 
tgtatgccac 
tggccaggct 
tgctgggatt 
ttggttgatg 
aaaataaaag 
tggaagaaat 
catatatgaa 
tatattaaat 
acccaagctg 
ttaagtgatc 
tggagaaatt 
agtggtatgt 
tgataatgtt 
tttttttttt 
tcggctcact 
gtagctggga 
gacggggttt 
cctcggcctc 
gatttctaat 
gagtctcgct 
ccacctccca 
catgtgccac 
gttggccagg 
actgctagga 



13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14 640 
14700 
14760 
14820 



13 



ttacaggtgt 
aactgatgcc 
tcattttcct 
cccagctact 
gtgagctgag 
aaaaaaaaaa 
tcccagcact 
cctggccaac 
agtggtgcac 
tccagaggtg 
gagcaagact 
ttcttatcta 
tccagtggtt 
aaaccttagt 
atcaaataga 
ctatcaagga 
tttttttttt 
gcgagatctc 
cctcccaagt 
tttttagtag 
tgatccgccc 
gcctacttat 
aactgtagta 
ctcattttaa 
tttattctta 
tgctctgtca 
cccgggttca 
ccactatgcc 
caggctgatc 



gagccaccgt 
gtaatatgcc 
ttaaaaatga 
cgggagggtg 
atctcaccca 
aaaggctggg 
ttgggaggca 
atattagtga 
gcctgtagtc 
gagattgcgg 
ccatctctca 
tcttaaattt 
gaaagagagt 
tccaagagaa 
aacgagttga 
gctggtaata 
ttcttttttg 
agctcgctgc 
agctgggact 
agacggggtt 
accttggcct 
cttctaattt 
gtgttaattg 
tgaaatctcg 
tggcatacta 
cccacgctgg 
agcgattctc 
tggctaatat 
tcgaactcct 



gcccagcctt 
aaattaaatt 
agtaaaactt 
aggcaggaga 
ttgcactcct 
catggtggct 
gaggcaggcg 
aaccccgtct 
ccgggtactt 
tgggccaaga 
aaaaaaaaaa 
tttctttagg 
gcacttttga 
tgtaattctt 
ataggcagtc 
tcatgcactg 
agacagagtc 
aagccccgcc 
acaggcgcct 
tcaccgtgtt 
cccaaagtgt 
aactgaaaac 
aaatatttgg 
tgtaaatgtg 
aaaaaaaaaa 
agtgcagtgg 
ctgcctcagt 
ttttgtattt 
gacctgaagt 



gacaggtctt 
agttcagact 
tagccggatg 
atcgcttgaa 
gcctgggtga 
caagccgggt 
gatcacttgt 
ctactaaaaa 
gggacgctga 
tcacgccact 
aaaaagaaaa 
agattgaata 
agtctgcctc 
cctctttctc 
tcttcaaagg 
ccattccctt 
ttgctctgtc 
tcccacgttc 
gccatcacac 
agccagtacg 
tgggattaca 
caatttattt 
taccttgaaa 
ttttatatgg 
aaattttttt 
cgcaatctcg 
ctcccgagta 
ttaggagaga 
gatccgcctg 



tagtttgatt 
gaaacggatt 
tggtggcggg 
cccaggaggt 
gaagagtgag 
gcagtggccc 
agtcagaagt 
tacaaaaatt 
cgcaggacaa 
gcacgccagc 
ctgagtttat 
tttttgtact 
ttggctgtcc 
agtgcttcaa 
tttcctaact 
ggcaacatga 
acccaggctg 
acgccatttt 
ctggctaatt 
gtctcgatct 
ggtgtgagcc 
gattcagtga 
tgttaaatgc 
tgactatgtt 
ttttttttgt 
gttcactgta 
gctggtacta 
cagggtttca 
cctcagcctc 



ttagttcaac 
acttaaagat 
tgtgtgtaat 
ggaggttgca 
actccatcaa 
atgcctgtta 
tcgcgaccag 
agctgggcat 
ttgcttgaac 
ctgggcaaca 
attgttatgg 
ttagtcttaa 
ttgacaacac 
aatatatatg 
ctgtggttaa 
cttatctttt 
gagtgcagtg 
cctgcctcag 
ttttttgtat 
cctgtcctcg 
actgcgcctg 
aatggcatca 
caaattaaat 
tattctgaat 
aatggagtct 
acctccacct 
caggcgtgca 
ccatgttggt 
ccaaagtgct 



14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
15900 
15960 
16020 
16080 
16140 
16200 
16260 
16320 
16380 
16440 
16500 
16560 



14 



gggattacag 
taaaggcctt 
tgttgactta 
cagttaattt 
gagatgtgtt 
catacacatt 
cttagctcat 
agtagctggg 
gatggtgatc 
aagactgaaa 
ctcatagcag 
tttatactgc 
aaaagagaag 
gctcctgtgg 
gtgacgtttg 
atgagagatc 
attgtctaaa 
agacagaaat 
catgtgtgta 
agagatttat 
actaagaaca 
cataagacgt 
ctagaagttg 
ggtcaaaaga 
ataggttgta 
gagagaatac 
attttctcct 
ttttccccac 



gtatgaccca 
aatttatctg 
atgagaggat 
gtacagctct 
aaaacattag 
ttgagacagg 
tgaagcttca 
tctacagaca 
ttgccctgtt 
acctgtgata 
aaacaaaaat 
aggtttttat 
gttgtaataa 
gtagacaagc 
atttctcaag 
aagtttctca 
taaaattttt 
tgtctaagta 
tttaaaataa 
tctgagccaa 
tctgtccaag 
cagttaaaca 
gggaggcttc 
gcttatctaa 
gctaccaagg 
attgtaaatg 
ggttcaggta 
aagagaaagc 



ctgcacccaa 
aaaccaaact 
atgtgaagtc 
gcattttaga 
ttatgtgatt 
gccttggtct 
gcctcgcagg 
tataccacca 
gactaagttg 
gccattttat 
agaatttaag 
gttgtaatgc 
tggttctttg 
cggaaatctc 
tacataggac 
ggatattcta 
tctattcata 
aaatattttt 
cttcgtcaat 
atatgagtga 
gtggtcaggc 
tgtaagatgt 
caggtcgtag 
agtcctggaa 
tttttattgt 
tttcttatga 
aaagacttgg 
tttgcagggc 



cccatactca 
atttcaaaag 
tatttattaa 
tatttgagaa 
aacaaatatg 
gttgttcagg 
ctcaatcgat 
tgcttggctg 
gtaaatattt 
ataaggagaa 
tgaatggact 
tggtaatgag 
gatttactag 
ctgggtaaca 
taaacagaaa 
gggctaaagg 
gttttaaagg 
ctgttcatag 
gaaaagagtc 
ccagtggccc 
tatagcttga 
acattggttc 
gcagattcaa 
tccatagaag 
acagatgaag 
gactttaaaa 
aaagggaaag 
catttcagaa 



aatttgacac 
aggaatagca 
agcaaatatt 
atatttattt 
tgtacatacg 
ctggagtgca 
ccacccacct 
attttttaat 
taattgttga 
gctgaagttc 
caaaatattg 
ctccttggaa 
aacatatcat 
caatggtgga 
aggcctagta 
atcaggcatc 
gctaaagggt 
ttttaagctg 
aaactctgta 
atgacacagc 
ttttatacac 
catctggaaa 
agattttctg 
ggagtgtctg 
cctccaggta 
ggtggcagac 
gattctctac 
tatgtcaaag 



tgaattttca 
cagcaaattc 
aattggaggc 
cctctccagt 
tatatatgta 
atgacaccat 
cagcctccct 
tttttgtaga 
actttcttgg 
aaagagtaga 
tactttttac 
tatttggagg 
gttctgcatg 
ggttctctag 
tgttatatga 
gaagacagaa 
caggcattga 
tgtatatgtg 
aaatatttga 
cccagtagat 
tttagggaga 
ggcaggaaaa 
attggcaatt 
gtttaaaata 
gcaggcttca 
tcttaagtta 
agaacgtaaa 
aaatataatt 



16620 
16680 
16740 
16800 
16860 
16920 
16980 
17040 
17100 
17160 
17220 
17280 
17340 
17400 
17460 
17520 
17580 
17640 
17700 
17760 
17820 
17880 
17940 
18000 
18060 
18120 
18180 
18240 



15 



tagggtaaaa 
tctgttgccc 
gtgttcacgc 
ccatgcccag 
gatggtctcg 
tacaggcgtg 
tctgacgtga 
tcttaatatc 
agggtatatg 
ttaactttgg 
tttatttttt 
atataattca 
tttcagaaga 
actgctgaaa 
gcctaatgat 
acattcagat 
aaacatttca 
gcagccactg 
ttcctcatgt 
gtcctgatga 
agtgatggcg 
gcctgtgttg 
tattgaatta 
tctctctgtt 
actcttctat 
gacagggtct 
cacctcaccc 
ttcactcttg 
cctttttttc 



tacttcaatt 
aggctggagt 
cgttctcctg 
ctattttttt 
atctcttgac 
agccactgtg 
tgctgtacta 
tctgttttaa 
agtcatgtcc 
aatgcctttg 
gcttaccggt 
atattgtact 
tgaggatgga 
acctatacct 
tctactttga 
gataccatac 
cattttttct 
ttaaggactt 
tgaatacagt 
caattaagga 
tttcgctgtg 
gcccctcaaa 
ttgaagaagg 
ccacccagta 
attagttttc 
tgccgtattg 
cataaagtgc 
acctataata 
tctcgtttgt 



tgttttattt 
gcagtggcat 
cctcagcctc 
tgtattttta 
ctcgtgattc 
cccggcctca 
gagtcaggct 
tgttaatgct 
aacccccact 
gcaaggggag 
ttataaaaag 
ttatttaaaa 
gggtagaaat 
accactttga 
agtttctctg 
ttttccaaaa 
caagattatt 
tatcccttgt 
taaaccctat 
tagttagtaa 
ttggccaggt 
gtgttaggat 
atgttcccta 
gtagttgggt 
ttcactcttt 
cccaggctgg 
ggagattaca 
gtcctgcaaa 
gaattattaa 



atttatttat 
gatctcggct 
ctgagtagct 
gtagagacgg 
acccgcctcg 
aaatacttca 
gggaatttgg 
gatcagttgt 
tctcattatg 
ggtccatgag 
ttaatgaaaa 
ctcacgtata 
cagaagtgtc 
aaggattaat 
attaaactaa 
gataacattg 
tactggcatc 
tctgttttta 
taaactggat 
atggatattg 
tggtctcgaa 
tacaggcgtg 
acacttcctg 
ttttgatgtc 
tttttaaaaa 
tcttgaactc 
ggcatgagcc 
gccagtgaag 
caatcgctga 



tttttgagac 
cactgcaagc 
gggactacag 
gatttcacca 
gcctcccaaa 
atttctttca 
cgtcttattg 
ccctgaattc 
gcctgaacta 
tcagttgggg 
ttatcatatt 
aaatagctgt 
agatttggta 
ttcagacttg 
ggaataaatc 
cttttgatta 
tgcaccaaag 
gctggtttgt 
tccccatatt 
aatcatttta 
ctcctggcct 
agccactgca 
cctcttttcc 
tggcaaagta 
tgtttttaaa 
ctgagctcaa 
accatgcctg 
ctgttaatat 
tcttacacat 



agagtctcgc 
tccgcctcct 
gcgctggcca 
tgttggccag 
gtgctgggat 
tggcctgcta 
ctacaaaaca 
caaagggaag 
gttttttagg 
gtcttagagt 
tcataattct 
ccatatctgt 
attttcttac 
ctttctttgg 
tgataaatgg 
catatgcaat 
acacaaaaaa 
tgttgtattt 
actgttagtt 
tttttttagt 
caagtgatct 
cctggctgga 
gctgtcttac 
tagattgtct 
gcaagataga 
gcaatcctcc 
gccagttttc 
gctgacgtag 
catatacaat 



18300 
18360 
18420 
18480 
18540 
18600 
18660 
18720 
18780 
18840 
18900 
18960 
19020 
19080 
19140 
19200 
19260 
19320 
19380 
19440 
19500 
19560 
19620 
19680 
19740 
19800 
19860 
19920 
19980 



16 



aaaacatctt 
atgtcaaaat 
tgtagacttg 
acttcataag 
tatggttgag 
aaccatggta 
tgattttaag 
tgcgaataat 
gtctaaagct 
tggtatttca 
tgtagagtgg 
taagtgccag 
taagctgttt 
ggctctgtca 
tcctggattc 
gtcaccacac 
aggctggtct 
agattacagg 
cccttttcga 
tcttgggcaa 
acataatagg 
tacagctaga 
ttaggatgca 
atataggccg 
gcagattacg 
actaaaaata 
tttgggaggc 
agatcgcgcc 



tgaacactgt 
ctgtgatttt 
ctgtacttac 
tgggctttgt 
caagtggggt 
tgtcatagta 
gagggtttaa 
tctatgaata 
gtagtagtta 
tggtttagac 
tcttgttcca 
gtcaaatacc 
ttctctttct 
cccaggctgg 
aagcgactct 
ccggctaatt 
cgaactccgg 
catgagtcac 
actgtggggc 
gttacttaac 
tactcagtaa 
ggttatgtta 
gtagcagcat 
ggcgtggtgg 
aggtcaggag 
caaaaacaaa 
tgaggcagga 
actgcacccc 



cacatctcag 
cttagaggtt 
ttacataact 
caatgccctc 
ttattaccca 
aagggtgtta 
agaagcaggg 
aatatcataa 
aaaagcaaag 
agtgttcagt 
ccgtggtcac 
aaggtttagc 
cagctttttt 
agttcagtgg 
cctgcctcag 
tttgtatttt 
acctcaagca 
tgcacctggc 
ttcttatgga 
tatgtatgct 
atgagaacta 
gaacattgtt 
aaacgagata 
ctcacgcttg 
attgagacca 
attagccggg 
gagtggcgag 
agcctgggcc 



cagctcattc 
attaaatgtt 
tttctgctct 
cagtcagcga 
ctgtagccag 
ggaagatagg 
ttttgctctg 
aacctatcta 
gtcactcctt 
gttcatgttt 
agagtggcat 
tgatagtagt 
ttttgttgtt 
gggcaatctc 
cctcccaact 
tagtagagac 
atccatctgc 
ctgtttctta 
aattgacatt 
gagatgtttt 
ttattataaa 
aactcttcgt 
tggggaagaa 
taatcccagc 
tcctggctaa 
cgtggtagtg 
aacccaggag 
acagagcgag 



tggttaatga 
ttacagctat 
tctgcagagg 
tctccaggaa 
ggagaacaca 
atttgggctc 
atttggatgc 
gaagaaaaga 
atctggaaaa 
tgcctgtgtt 
tatttcatgc 
aggccagctc 
ttttttttaa 
agctcactgc 
agctggaatt 
ggggtttcac 
ctcagcctcc 
gatttgaggg 
taagtcctga 
tttaaatgct 
atcaatagta 
actagtttct 
cgagaggtat 
actttgggag 
cacggtgaaa 
ggcgcctgta 
acagagcttg 
actctgtctc 



ggaaagaaaa 
gtagatattc 
gagagaatta 
caaacttatc 
catatgtaat 
gtttatttgg 
taccaggagc 
ctagagtgag 
ggggaatatt 
tagacataat 
tgatattctg 
ctggatgtaa 
aatagagtct 
agcctctgcc 
acaggcgcaa 
catgttgagc 
caaagtgctg 
tcaactttta 
ccatatagga 
tagtgcttgc 
cttttaagat 
gttacacttt 
ttaaacagtg 
gctgaggtgg 
ccccatctct 
atcccagcta 
cagtgagccg 
aaaaaaaaaa 



20040 
20100 
20160 
20220 
20280 
20340 
20400 
20460 
20520 
20580 
20640 
20700 
20760 
20820 
20880 
20940 
21000 
21060 
21120 
21180 
21240 
21300 
21360 
21420 
21480 
21540 
21600 
21660 



17 



aaaaaaaaaa 
gatggaagtc 
tttttttttt 
ctcagctcac 
agtagatggg 
tagagacagg 
ccccctcctt 
tggctttttt 
tttgagacag 
ctgcaacctc 
gattacaggt 
ttgtcatgtt 
ctcccaaagt 
agaagataaa 
tctttacata 
ataaatgttt 
ttttcttgta 
aagcaatcct 
tgggtctttt 
gaattaatca 
tggtccatgt 
tttctttgag 
cattatatgt 
actctcaagc 
ttttttgttt 
cacaatctcg 
ctcccaagta 
agtagagatg 
cacacacctt 



attaaaacaa 
tctccctaac 
ttgagacaaa 
tgcaggctct 
actacaggca 
gtttcactat 
ggcctcccaa 
tttttttttt 
agtctcgctc 
tgcctcccag 
acgcgccacc 
gtccaggcta 
gctaggatta 
taaggtgaca 
gggtttaaaa 
gctagtgaaa 
gacacagggt 
cctacctcag 
ttaccaaatt 
ggggctatta 
agtttgctgg 
aacaggggcc 
tgcctaggct 
agctgggact 
ttttttctga 
gctcactgca 
gctgggatta 
gggtttcacc 
ggcctcccaa 



aaatatttgt 
ctcactcctc 
gcctcactct 
gcctcccaga 
catggcacca 
gttggccagg 
agtgctggga 
aatgaaaaat 
tgtcgcccag 
gttgaagtga 
acgcctggct 
gccttgaact 
taggcgtgag 
tttttaaggg 
ctttccaaat 
gcaaatacaa 
cctactgtat 
tcttccaaag 
atagtagaaa 
gtaattcatc 
ggattaacac 
ataatcagta 
ggactcgagc 
ataggtgtgt 
gacagagtct 
acctctgttt 
caggcgcctg 
atcttggcca 
agtgctggga 



gttaattgtg 
atttagtgtc 
gtcacccagg 
ttcaagcaat 
tgcccagcta 
ctggtctcga 
ttacaggcgt 
tcaaaatgct 
gctgaagtgc 
ttctcctgcc 
aatttttgta 
cctcgcctcg 
tcgctgcacc 
tcaaagaaaa 
taacagggaa 
tctttttact 
tacccaggct 
tgctgggact 
gcactttttc 
cctgaattaa 
accatgaaag 
gtccttaaat 
tccttggctt 
gccaccatac 
cgttgtgtcg 
cccaggttca 
ccaccatgcc 
ggctgatctc 
ttataggcgt 



atgacaaaaa 
atggcttttt 
ctggactgca 
tctcctgcct 
attttttgtg 
actcctgagc 
gagccctgct 
cttttttttt 
agtggtgtga 
tcagcctctc 
tttttaatag 
tgtgatccac 
tggccacaga 
tgtcaaaaac 
aataattctt 
aaatgtttta 
ggtcttgaat 
acagacatga 
tctaatggtg 
tcagtgatta 
tctaccagga 
gaaatggact 
aagtgatctt 
ctggcttaat 
cccaggctag 
aacgattctc 
tggctaattt 
gaactcttga 
gagccaccgc 



aaaaaaaaga 
tctttttttt 
gtggtgcaat 
cagcctcaca 
tagtttttag 
tcaagtgatc 
cccagactcc 
tttttttttt 
tctcggctca 
gaacagctga 
agatggggtt 
ctgccttggc 
aattttttga 
tagaatgatg 
taccttgaaa 
ttaaattttt 
gcctggcctc 
gccatcacac 
aactatgaga 
taatgctttg 
gatttttttt 
attcccattt 
cccacttcag 
tgagtgtttg 
agtgcagtgg 
ctgcctcagc 
ttgtattttt 
cctcatgatc 
gcccggcttt 



21720 
21780 
21840 
21900 
21960 
22020 
22080 
22140 
22200 
22260 
22320 
22380 
22440 
22500 
22560 
22620 
22680 
22740 
22800 
22860 
22920 
22980 
23040 
23100 
23160 
23220 
23280 
23340 
23400 



18 



aattgagatt 
aaaattgtag 
gattaaatag 
acatacaact 
aagtataata 
catataccct 
cttttcaggt 
ctattaaatg 
catatcctct 
tctcactaat 
tatctcttac 
ctaaaaaata 
gttgaggcag 
ccattt ctgt 
agtctcgctc 
cgcctcccgg 
gcccgccact 
tagccgggat 
ttgcctccca 
gcatgtgcca 
aagctggcca 
aaagtgctgg 
aaatattagc 
gaaaggattg 
ctccagcttg 
agtatagaac 
tcttaacttt 
aatttgcaat 



tttagatatc 
gaaaatggat 
ataagtaaaa 
gtgcacacag 
aagatactaa 
attttatatg 
aaggagtttg 
gtggaacaag 
cactctaccc 
atttgaaaga 
ctctatttaa 
atgcagtcca 
gaggatcact 
tttttttttt 
tgtcgcccag 
gttcacgcca 
acgcccggct 
ggtctcgatc 
ggttcaagtg 
ccatgcctgg 
ggctggtctc 
gattacaggt 
ccagtgtagt 
cttgagccca 
ggtgacagag 
ttgaaaatct 
ttgtacttac 
ctttttcact 



tattactctg 
atatttgttc 
ctggtgggct 
ttcaggagaa 
aagtagtgtt 
atttttgcca 
acgcccagac 
gacttaaatc 
caagctaccc 
aaacgtacag 
agatgaatat 
ggtgcagtgg 
tgaagccagg 
tgtttgtttt 
gctggagtgc 
ttctcctgcc 
aattttttgt 
tcctgacctc 
attgtcgtgc 
ctaatttttg 
gaacgcctag 
gtgagccact 
ggtgcatgtc 
gaatttcaag 
tgagaccctg 
tccttaacct 
ataaacataa 
taccatattt 



ctaattttgt 
cttggaatgg 
ttatataaca 
ggaggattta 
tccataccac 
caagtcagag 
agattaactg 
tttgccttct 
atgttttgac 
tagataattt 
cagcatttct 
ctcacacgtg 
agttcgagac 
gttttgtttt 
agtggcggga 
tcagcctccc 
atttttagta 
gtgatcccca 
ctcagcctcc 
tattttttgg 
cctcaagtga 
gtgcccggcc 
tctagtccca 
gctacagtga 
tctctaaaaa 
taccataagg 
atattcatca 
tggaattttt 



cacttgcaag 
tttgtgtgag 
tagatgagca 
agttaatcaa 
tttattactt 
ttaggtaaaa 
acttttccaa 
aactcacata 
ccttcttgtg 
gcaagttaat 
gttgtttcta 
taatcctagc 
tagcctgggt 
gttttgtttt 
tctcggctca 
aagtagctgg 
gagacggggt 
tttctttttc 
caagtagctt 
tagagatgga 
tctaccctcc 
ccatttctac 
gctactcaga 
gctatgataa 
atgaagtaaa 
gaaatgatta 
gagaaaaaaa 
ttcatttcaa 



ttgccatcag 
aatacttaag 
aatgtcagga 
caaatttact 
aaagtatcat 
gaaatacttg 
aatcatattg 
cttgcaaaca 
gcaatctggg 
ctgttacgca 
cagtaacata 
acttttggaa 
atgcaagacc 
tttgagacgg 
ctgcaagctc 
gactacaggc 
ttcaccgttt 
actgcaacct 
gggattacag 
gtttctggcc 
ttagcctccc 
aacaattaaa 
aggctgaagt 
tggcattgca 
atagtgcaca 
ctaataagtt 
tatgcaaaac 
tatatttgat 



23460 
23520 
23580 
23640 
23700 
23760 
23820 
23880 
23940 
24000 
24060 
24120 
24180 
24240 
24300 
24360 
24420 
24480 
24540 
24600 
24660 
24720 
24780 
24840 
24900 
24960 
25020 
25080 
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cttccttgtg 
aacatggctt 
tacaggtaca 
tcaccatgtt 
ctcccgaagt 
tctgcaaaac 
gattagaact 
ggaggctaag 
tgagagaccc 
agctattcaa 
agatcatacc 
tattttttcg 
tctgacattg 
gcagcctcct 
ccacaggtgt 
accatgtggt 
agccactgtg 
tgaataatgt 
tttggatata 
ttgaactgcc 
acaagggttc 
aaagggtatg 
atgatgctca 
gtctattaat 
aaacatttta 
taaaacttta 
aaggcatgca 
gtaagtttct 
gtgttatatg 



ttttttcagt 
attgcagcct 
tgtcaccacg 
gcccaggctg 
gttgagagaa 
ctcataagtg 
gtgtttttag 
tgggagtatc 
tgtctttaga 
gaggctgagg 
actgctctct 
tttttttcca 
ggtcttgctc 
cttcctggac 
gcaccatcat 
tcaccatctc 
tgcagcccac 
ggctatgaac 
tacctaggag 
aacctgtttt 
cagtttctcc 
aagtggtatc 
gcattttgtc 
gttcttttcc 
agctctgcag 
tgtacaaaaa 
ttagataatg 
gccacccaaa 
gggtcataga 



ttgtttgttt 
caatctcctg 
cccggctaat 
atcttgaatt 
caggtgtgaa 
gctaatagag 
gctgggtgcg 
acttgagccc 
aaaaaattaa 
tgggagaatc 
agcatgggtg 
actcatgtac 
tgtcacccag 
tcaagcagtc 
gcctggctaa 
taactcctgg 
gttttttatt 
attggtctaa 
tagaattact 
ccataggggc 
acatttgtta 
tcgttgtgat 
atgtacttat 
cattttttaa 
cataactact 
taggcagctt 
tagttacact 
ttctttcttg 
gttaaaagag 



ttgtcaccca 
ctcccattca 
ttttattttt 
cctgggctca 
ccaccatgct 
gaatatagta 
gtggctcacg 
gggagttcaa 
ccaggtgtgg 
gcttaagcct 
acagagcgag 
acccgccacc 
gctagagtgc 
ctaccacctc 
tttttgtact 
gctcagtcag 
aatggatatt 
atatctgttt 
gaattatatg 
tgcaccattt 
tttttcattt 
tttgatttgc 
gtaccatttg 
ttgggttgtt 
caaccctgtc 
actagattta 
attggctaat 
atttgatgta 
aaatgtctat 



ggttggagtg 
gcccctcaag 
attttggtag 
agtgacccgc 
ccacctctta 
aagcaaaggg 
cctgtaatcg 
gaccagcctg 
tggtgcacac 
aggaggcgga 
acccagtcta 
ccacccctgc 
agtagcacaa 
agcctcccaa 
ttttgtagag 
tccacttttg 
tggattgttt 
aagtcccggc 
gtaactttct 
tgcattccca 
tttaaaataa 
atgtattttt 
tgtatcttct 
tttatgttta 
acatggtaag 
atcttagtcc 
aatttaaact 
gtctggttgg 
gagaaactag 



aagtggtaag 
tagctgggac 
agatggggtt 
ccacctcagc 
gtctttacaa 
ggatatcact 
caacattttg 
ggcaatatag 
ctgtggtccc 
ggttgcagtg 
aaaaaaaaag 
tttttttttt 
tcaactcact 
gtagctggga 
atagggtttc 
cctcggcatg 
ccatctattg 
tttcaatact 
gtttaaattt 
ccagcagtgt 
tagtcatcct 
ctaatgactt 
ttggaaaaat 
tcaattttgt 
attgacccag 
atagtttgct 
acaagtggtt 
ttgaatttga 
ggactgttgg 



25140 
25200 
25260 
25320 
25380 
25440 
25500 
25560 
25620 
25680 
25740 
25800 
25860 
25920 
25980 
26040 
26100 
26160 
26220 
26280 
26340 
26400 
26460 
26520 
26580 
26640 
26700 
26760 
26820 
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gagctaatgt 
ttttctgcta 
acccctacat 
ctcagttctt 
gcccaagctg 
aagtgattct 
ctggctaatt 
aatggcacga 
tcagcctccc 
tttttagtag 
tgatccaccc 
gcgttttttt 
ttgatctcct 
atgagccacc 
ccaaaatctt 
agttgcatga 
cacttcaagt 
tctctactaa 
ctcaggaggc 
agatcatacc 
aaaaaaaaaa 
ctttttatat 
tcctttctaa 
tgtttttaaa 
agggtctctc 
gcgattctcc 
agctaatgtt 
aactcctgac 



taaaggattt 
taaatgtatc 
taagggattg 
tttttttttt 
cagtgccatg 
cctgcctcag 
tttttttttt 
tcttggctca 
aaatagctga 
agatggggtt 
gcctcagcct 
gtgtgtgttt 
gacctcgtga 
acgcctggct 
ttgccacagc 
gtccactgaa 
agacgaggtc 
aaatacaaaa 
tgaggcagga 
atttcactcc 
aggaaagaaa 
ttcgataatt 
attttggtaa 
actttttttc 
tcagtggtgc 
tgcctcagcc 
tgtattttta 
cccaggtgat 



tggaggcctt 
taaaggataa 
aaataatttg 
tttttttttt 
gcacgatctc 
cctccagagt 
tcaagacaga 
ccgcagcctc 
ggcaacaggc 
tcaccatctt 
cccaaagtgc 
tcagtagaga 
ttcacctgcc 
gtggctcagt 
ttctcctggg 
agctttgctt 
aggagatcga 
cttagctggg 
aaattgcttg 
agcctgggcg 
agtgctccaa 
tttaaaataa 
ttttcattgt 
tagttctctc 
gacctccctt 
tcccaaatag 
gtagagattg 
ccgcctgcct 



ttgtgcactg 
tcagtgtaag 
aatatgggtt 
tttttttttg 
agctcgctgc 
agctaggact 
gtttcgctct 
cgcctcctgg 
gtgcgccacc 
ggccaggctg 
tgggattaca 
tggggcttca 
tcagcctccc 
tcttaactgt 
catgctctga 
attttctcca 
gaccatcctg 
cgtggtagcg 
aacccgagag 
acagagtgag 
atgctgggct 
taaccatcct 
tttggaaact 
tctttttttt 
actgcaacct 
ctgggagtac 
ggtttcatca 
cagccttcca 



gagaccattg 
ttatgggctg 
tcagttcttg 
agatggagtc 
aacctccgcc 
acaggcacgt 
tgtcacccag 
gttcaagtga 
acgcctagct 
gtctcgaact 
ggcgtgagcc 
ctgtattagc 
aaagtgctgg 
tcattcattc 
aattcatttt 
ccactttcag 
gccaacatgg 
tgtgcctgta 
gcagaggttg 
acactgtcaa 
catctctgct 
aatgggcttg 
cactaatccc 
tttttttttt 
ctccctcctg 
aggcacccac 
tgttggccag 
aagtgctggg 



gaagattgga 
tagtttgcca 
tatggtctgg 
tcactctgtt 
tcctggcttc 
gccaccacat 
cctggagtgc 
ttctcctgcc 
aatttttctg 
cctgatctca 
acctcacctg 
cagaatggtc 
aattataggc 
agggtcccag 
ttgtctgttt 
agttcacagt 
tgaaaccccg 
gtcccagcta 
cagtaagccg 
aaaaaaaaaa 
ggcttctctc 
atatggtccc 
ttcaaacaga 
tttttgagac 
ggttcaagca 
ccccacgccc 
gctggtctcg 
attacagcca 



26880 
26940 
27000 
27060 
27120 
27180 
27240 
27300 
27360 
27420 
27480 
27540 
27600 
27660 
27720 
27780 
27840 
27900 
27960 
28020 
28080 
28140 
28200 
28260 
28320 
28380 
28440 
28500 



21 



tgagccaccg 
tgctgatgtt 
tgtttgttaa 
tgggaggccg 
tggagaaact 
tcatcccagc 
tccagggagc 
ctcaaaaaaa 
aatggaagca 
ttgagattta 
tattccacgt 
tcctgttttt 
tggagtcttg 
ctctgcctcc 
ggtgcccgcc 
gttggccagg 
gtgtgagcca 
tcatgtacag 
ttgactcatg 
ctgggtgcca 
gtaatatttt 
ttgtatcttt 
atacgcttac 
actatatata 
atatatattt 
atattttttt 
atgatcttgg 
tcagcttcct 
tttttagtag 



cacctggctc 
ataatccaag 
aaatcagatc 
aggcaggtgg 
ccgtctctac 
tacttgggag 
tgagatcgtg 
aaaaatcaga 
tgtagtatgt 
tcccttttcc 
gtggagatac 
gttattgttg 
ctctgttgcc 
ccggttcaag 
accacaacca 
ctggtcttga 
ccatgcccat 
gtcttagaag 
tggtaaatat 
ttttgtttcc 
ggtattttta 
ggattttaat 
ttgccaatta 
tatatatata 
atatatacta 
ttttttttga 
cttaccgcaa 
gagtagctgg 
aggtggagtt 



tagttctctt 
cccaaatgga 
tgctgggcac 
atcccctgag 
taaaaataca 
gctgaggcag 
ccattgtact 
tctgtcccta 
agccttttgt 
tacatatgtc 
cacaatttgt 
ttgttgttgt 
caggctagag 
tgattctcct 
gctaattttt 
actcttccct 
cctcctgttt 
tttgcatttt 
atgtttaatg 
taccagcaaa 
tgttcttttg 
ttgtacttcc 
tgtcttaatt 
tatttttatg 
tatatattta 
gatgaagtca 
cctcaacctc 
gactacagac 
tcgccatgtt 



ctaatcttac 
tattcttata 
ggtggctcaa 
atcaggagtt 
aaattagccg 
aagaattgct 
ccagcctggg 
tggttttatc 
gtctcgcttc 
agtagtttgt 
tactccattc 
ttaattaatt 
tgcagtggcg 
gcctcagcct 
gtatttttag 
ccttggtctc 
ttaaatttta 
tcttgggtaa 
tcataagaaa 
cataatgaaa 
tttttttgcc 
ctaacaacta 
ctttggcaat 
ctatatatag 
tatatactat 
tgctctgttg 
tgcctcccag 
acatgccacc 
gtccaggctg 



tatagtaatc 
cattaaatgt 
tgcctgttat 
cgagaccagc 
ggcgtggtgg 
tgaacctgtg 
caacaagagc 
ttcactgcat 
tctccctttg 
ttctttttaa 
actaaatgat 
aattaatttt 
caatcacagc 
cccaagtagc 
tagagatggg 
ccaaaatgct 
tgaataaagg 
atatgtagaa 
ctaccaaact 
ggtccattac 
attcaaaatt 
atgatgtgga 
acttatattt 
agtatatata 
atatatatat 
accaggctgg 
gttcaagtga 
atgcccggct 
gtcttcaact 



aaaattttag 
tggaatatca 
cccagcactt 
ctgaccaaca 
cacgtgcctt 
aggcggaggt 
gaaattccat 
gtcatatata 
cataatattt 
tgctgaatag 
ttgggttgtt 
ttttttgaga 
tcactgcaac 
tgtgattaca 
atttcaccat 
gggattacag 
tactgtaaat 
gaagagattg 
gtttttccaa 
cctttgtctc 
gggtgcattg 
gcatcttttt 
gactctgcct 
tttttatact 
atatatatat 
atgcagtggc 
ttctcctgcc 
aatttttgta 
cctgacctca 



28560 
28620 
28680 
28740 
28800 
28860 
28920 
28980 
29040 
29100 
29160 
29220 
29280 
29340 
29400 
29460 
29520 
29580 
29640 
29700 
29760 
29820 
29880 
29940 
30000 
30060 
30120 
30180 
30240 



22 



ggtgatccac 
ccagctggaa 
tgacaaacta 
cagagtcttg 
gggttcacgc 
ccatgcctgg 
caggatggtc 
gattacaggc 
gaaactgcag 
tgaaaaactg 
ggtggttcat 
tcgggagttt 
aaaattaggc 
caagaatcac 
ctccagcctg 
aaggagcatg 
gtactcattt 
actcagaata 
acaaaaatac 
aattttagtt 
tttttttttt 
tggtacaatc 
gtcccccaag 
tttggtagag 
cagtcctcct 
agtcttcata 
accagccgta 
gatgaatatg 



ccgcttcagc 
taagcaatct 
tgactcaata 
ctctgtcacc 
cattctcctg 
ctaatttgtt 
tcgatctcct 
atgagccacc 
gtatgcatta 
gaaacaacct 
gcctgtaatc 
gagtccagcc 
aggtgtggtg 
ttgagcctgg 
ggcaatagag 
ggaatgggaa 
tgtttttaaa 
caagtgctat 
atgtaaacag 
aatttataac 
tgagtcaggg 
acatcttagt 
tagatgggac 
ataaaatctc 
gccttggcct 
aacttttctt 
gtatataata 
gttagttatt 



ctcccaaaag 
taaaaagcag 
gtttcatttc 
caggctggcg 
cctcagcctc 
ttttgtattt 
gacctcgtga 
gcccctggcc 
ggggacaggt 
caatatctta 
ctagcacttt 
tggccaacat 
gtgcacacct 
gaaatggagg 
tgagactccg 
gaggcactaa 
attttaagcc 
tacactattt 
gtgttttctt 
aaactcttaa 
tcttgctctt 
gcaacttcta 
cacagatgtg 
accatgttgc 
cccaaagtgc 
tacatgtcct 
ctgataattc 
ttcagggtga 



tgctgggatt 

tttgttgatt 

ttttttgttt 

tgcagtggct 

cggagtagct 

ttagtagaga 

tccacccgtc 

tacttttatt 

agaaaaatgt 

ataatagaaa 

tggaggctga 

ggtgaaaccc 

gtaatcccag 

ttgtagtgag 

tctcaaaaaa 

aaaaggaact 

aaatatgaca 

tctataattt 

tatggaaaac 

taatttcttt 

ttgctcaggt 

cctcttcagc 

tatccccaat 

ccaagctggg 

tgggattata 

tatcaagtac 

tataacataa 

agaaacagaa 



acaggcatga 

tctggtgaat 

gtttgtttgt 

cactgcaagc 

gggactacag 

cagggtttca 

tcggcctccc 

tcttaagcat 

tacagcattg 

aaatggttga 

ggcgggcgga 

cgtctctact 

ctacttggga 

ccgagatcat 

aaaaaaaaaa 

tcttttgtat 

ttgttatcac 

tccctttttc 

aggaaggtga 

ttcttcttct 

tagagtgcaa 

tcaagtgacc 

catggttaat 

cttgagctcc 

ggcgtgagca 

tttttgagca 

gaaattgacc 

gaatcgggga 



cccaccgtgc 

agagaaaatg 

ttttttgaga 

tctgcttcct 

gcgcccgcca 

ctgtgttagc 

aaagtgctgg 

atatcttaaa 

tttgaaatgt 

ggctggctgt 

tcacctgagg 

aaaaatacaa 

ggctgaggca 

gcctactgca 

aaaaaaaggg 

ctgttacatt 

ttttgttgcc 

aaactaaaaa 

gcatccaaat 

tctttttttt 

tggtacaatc 

tctctcctca 

ttttaaattt 

tgggctcaag 

aactgtgccc 

cctactgtca 

tgtttaaggg 

ggtagtacat 



30300 
30360 
30420 
30480 
30540 
30600 
30660 
30720 
30780 
30840 
30900 
30960 
31020 
31080 
31140 
31200 
31260 
31320 
31380 
31440 
31500 
31560 
31620 
31680 
31740 
31800 
31860 
31920 



23 



agtcataagg 
tggccaggtg 
gaggatcaca 
tctactaaaa 
cgggaggctg 
atcgtgccac 
aataaataaa 
gaaatttcaa 
gaggccaggt 
gattacttga 
caaaagaaat 
cgggaggcca 
taggattgtg 
aaaatgaggc 
gggtggatca 
cttaaattac 
aggctgagac 
caccactgca 
aagtctaaga 
aaattaattt 
tactttttga 
atgttttaaa 
tgtaatctca 
agaggaggta 
tttttgtttc 
attcctaaat 
gggctgataa 
tccccataat 
taatgtttct 



agacggcatt 
cggtggctcg 
acaaggtcag 
atacaaaaat 
aggcaggaga 
tgcactgcag 
taaatagact 
ataataataa 
gtggtcactc 
gcctaggagt 
acacaaaaat 
agggaagagg 
cccctgcact 
cgggcgcggt 
cctgagttca 
aaagattagc 
agcagaatcg 
ctccagccta 
agttaatttt 
tcaaatatgt 
ttatgctgag 
ctagttgttg 
tacagcagtt 
cccagcaatg 
attgttattg 
aaaatgtctt 
gcaaaatagg 
ggagttacat 
cagacttgtt 



tcttagtcac 
tgagcgcctg 
gagatcgaga 
tagctgggtg 
atcgcttgaa 
cctgggctac 
tcgtctgttt 
ggaatagtat 
aggcttgtaa 
tcaagaccag 
tagctaagtg 
atggatcact 
ctagtgtggg 
ggctcacctc 
agatcagcct 
tgggtgtgat 
cttgaacccg 
ggtgacagag 
cattcagaca 
gatttgagta 
tttttaggtc 
ggttcaaaga 
ctcatgatgc 
aacacattat 
aaatcatatt 
tctgtaatac 
actcacggcc 
atacacatta 
cacaaccctg 



tttgtgtggt 
taatcccagc 
ctatcctgaa 
tggtggcacg 
ccagggagtc 
agagagagac 
catcaagagt 
ggaaattttt 
tcccagcact 
cctgggcaac 
tggtggcaca 
tgagcccagg 
tgacagcaag 
tgtaatctga 
ggccaacatg 
ggtgcacacc 
ggagggggag 
caagactctg 
aatgttcaaa 
taacgatcac 
cttttaaagc 
aaatagaaat 
aatcaattgt 
atcagatttg 
tgtatttttc 
agcattttag 
ccaaaatgtt 
caatgataaa 
ttttttatgt 



gtttataata 
actttgggag 
caacatggtg 
tgcctgtaac 
ggaggttgca 
tccatctaaa 
cattgtatta 
gtttgattaa 
tcgggagacc 
atggcaagac 
tgcctgtagt 
agttggaggg 
actctgttta 
gcactttgag 
gtgaaaaccc 
tgtaacccca 
gttgcaatga 
tctcaaaaaa 
aataatagaa 
tttacagaag 
ttaacttgta 
gtgttattaa 
taaataaaaa 
aatatgagat 
aaaagtatat 
gttataagga 
gataaacatc 
aatactaaga 
attacctctc 



agacttcatg 
gccgaggtgg 
aaaccccgtc 
cccagctact 
gtgaaccaag 
taaataaata 
tattgatttt 
atggggacat 
agggcaggag 
cctatctcta 
cttagctact 
tgcagtgagc 
aaaaaaaaaa 
aggccaaggc 
cgtctttata 
gctactccag 
gccgagattg 
aaaagaaaaa 
ataaaacaaa 
ttattctaca 
tcaggaatta 
atgccacact 
cttcctctag 
taaacaatgc 
atacttaaaa 
tcaataccat 
atgaccatat 
agttatatag 
aaaagcatga 



31980 
32040 
32100 
32160 
32220 
32280 
32340 
32400 
32460 
32520 
32580 
32640 
32700 
32760 
32820 
32880 
32940 
33000 
33060 
33120 
33180 
33240 
33300 
33360 
33420 
33480 
33540 
33600 
33660 



24 



tttgcaatat 
ggacagaagg 
atagctgtta 
tgatatattc 
tctttctttt 
tacccaggct 
caagtgatcc 
cttagctaat 
cgctcttttg 
cccggattca 
catgccaccg 
gccaggctgg 
gtgggagtac 
gtagagacag 
cctcctgcct 
gattgtattt 
cagcactttg 
gatcaacatg 
cgcacctgta 
gcggaggttg 
aactctgtct 
tattatacca 
acttagaatt 
tataacaaca 
tgttgtgact 
ccaatttcaa 
tgaaatctat 
ttttataggt 



ttagtgtact 
agcaagctgt 
tagttacagg 
acatgattgt 
cttttctttt 
ggagtgcagt 
tcctgcctct 
tattattatt 
cccaggctgt 
ggctatttat 
tgcccagcta 
tttcgaactc 
aggcatgagc 
ggtctcactg 
cagctttcca 
atcttaataa 
ggaggccgag 
gagaaaccct 
attgcagcta 
tggtaagcca 
cagaaagaaa 
gtaagagtaa 
tgaattttaa 
atattaatat 
gcagcatggt 
agattttgaa 
gaaataccac 
attatatgta 



cttcatacga 
ggaatggtat 
acaaggtaag 
ccagatttca 
cttttttttt 
gatgtgatca 
gcttcccatg 
attattatta 
agtgcaatgg 
tctcctgcct 
atttttgtat 
ctgatctcaa 
cactgtgggc 
tgtttctgtg 
aagttgtggg 
atgtatttgg 
gtgggcgaat 
gtctctacta 
cttgggaggc 
agatcgcgcc 
gaaaagcaaa 
aaaacaattt 
tcttattgaa 
ctgtgaccaa 
ggatagcatt 
acgtggggaa 
agtaggttgg 
ttctttgaga 



tctatacaaa 
aagaaaggta 
attgtatttg 
gatctattta 
tttttttttg 
tagctcactg 
tagctgggac 
ttattattac 
tgcaatctcg 
gagactcctg 
ttttagtgaa 
gtgatctgcc 
cacacttagc 
actgatctta 
attacaggca 
gccaggcaca 
cacctgggat 
aaaattcaaa 
tgaggcagga 
attgcactcc 
acagtatttc 
aattaaagta 
gaggctgttt 
attggtgcat 
agtaagttag 
aaaatatttt 
aaatcatcat 
cagggtctca 



taatttttta 
ttgaagaact 
tttatagcca 
tttatttatt 
agacagggtc 
caacctcaaa 
cacaggcgca 
tagtttttga 
gctcactgca 
agtagctggg 
gatggggttt 
cacctcagcc 
taattaaaaa 
aacttctggc 
tgagccactg 
gtggctcatg 
caggagttcg 
attagccagg 
gaattgcttg 
agcctgggca 
atttaattaa 
tgatatatgc 
tagttttatt 
gcatgggcaa 
cattggttgt 
agaccccatg 
gagaaactgt 
ccctgttgcc 



ttttaaagca 
ggaaaaagga 
tcccaaatta 
tatttttctt 
tttcctctgt 
cttctgggct 
cactaccata 
gacagagtct 
atctccgcct 
attacaggtg 
tgccacgttg 
tcccaaagtg 
aaattttttt 
ctcaagtgat 
tacccagcca 
cctgtaatcc 
agaccagcct 
cgtggtgtcg 
gacccaggag 
acaagagtga 
aaatgccttg 
tcttttcatc 
tagatatggt 
tgaaaagcta 
aaaatgaatc 
aaataagacc 
aactattttt 
caggctggaa 



33720 
33780 
33840 
33900 
33960 
34020 
34080 
34140 
34200 
34260 
34320 
34380 
34440 
34500 
34560 
34620 
34680 
34740 
34800 
34860 
34920 
34980 
35040 
35100 
35160 
35220 
35280 
35340 
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tgctgtgatg 
caactcaccc 
ctctttatgt 
cccagagtgc 
aaatttcttc 
ttctgtataa 
aatattagtt 
aaagagctag 
tacaacttct 
tcaatgaaac 
ttttaacata 
ttttttttga 
gctcactgca 
gctgggacta 
cagggtttca 
tcggcctccc 
aacttctaat 
atttatacca 
tataaataag 
tcatgctacg 
acatatttac 
gtcccccaac 
ttcatattat 
acacctaaaa 
aatgcaattt 
atttgtttgt 
tgtttttcat 
tcctacagtc 
tcctcccacg 



tggtcacctc 
tcccatgctc 
tgcccaggct 
tgggattaca 
tgttttcttt 
agactgtgac 
gggaaatgta 
acgccttcaa 
aggtatcaat 
tttaatttgt 
aagaaaacgt 
gatggacgct 
agctctacct 
caggcacccg 
ccatgttagc 
gaagtgctgg 
tattcttctc 
agggctagca 
gttttattgg 
gtgacagaat 
tgtctggccc 
ttcttattat 
acctctaaaa 
aataactcca 
ccaaatttct 
atcaggatct 
tttcttccaa 
tgacttttgc 
ttaaatgata 



tcactgcaac 
ggctaatttt 
ggtcttgaac 
gaagtgagcc 
cttttgggta 
tccccatgaa 
gatattttaa 
gctaaaatga 
taatgtataa 
agaaagaaat 
ataacatata 
tgctctgtca 
cccaggttca 
ccaccacgcc 
cagaatggtg 
agttacaggc 
tcagctttga 
aactgcagct 
cacacagcca 
tgaatagttg 
tgagaaagcc 
tttgaaacaa 
gataaggatg 
aatttctttt 
aattttctca 
taaaaagatc 
tttatttgtt 
tgcatgcatc 
gttggatata 



atccacttcc 
tttttttttt 
tcctgggctc 
acaacacctg 
tacattttct 
agtagtttgg 
ttaatttttt 
tgactaattt 
tttgatgtgg 
agatcagtga 
caaaaacaga 
cccaggctgg 
cgccattctc 
ccgctaattt 
tccatctcct 
gtgagccaac 
taattatcac 
gatgggccaa 
ttcctttcct 
atacagagac 
tgccatcttc 
attattgact 
tcaaaataaa 
aacattatcc 
aaatttaata 
cgtatgtctc 
gatgaaacca 
ttttgtggta 
gagccttgat 



tggggctgaa 
tttttaagta 
aagtgattct 
gcctaaagat 
tctttttttt 
gtgataattt 
tctttcaggt 
ggttatggcc 
gatgtattgg 
ttgaaaatgt 
agagcataat 
agtgcagtgg 
ctgcctcagc 
ttttgtagtt 
gacctcgtga 
gcgcccagcc 
ctcactatca 
cccagctcac 
ttaggtattg 
aacactgccc 
acttgacatc 
tattttccat 
atcacagaaa 
aaaaaaactg 
atttttacta 
ttttaatgta 
ggtctttgtc 
tgttttaaca 
cacattgaaa 



gtgatccttg 
gagaccagat 
gctttgtctg 
agtattttta 
aaaaaaaaat 
atcgtgaaac 
gaacagtgtg 
aaggaccgct 
aaatgtgtgt 
ggtccaggct 
ggacttcttt 
cgtgatcttg 
ctcctgagta 
ttagtagaga 
tccacccgcc 
gagcataatg 
gtcttgattt 
tgcctattac 
tctatggctt 
tcaaaaccta 
ccacttcact 
ttataaatat 
taccattatc 
acccccccaa 
ttttttaata 
caggttcatc 
atgtaatatt 
tgttcttctt 
gttgattttt 



35400 
35460 
35520 
35580 
35640 
35700 
35760 
35820 
35880 
35940 
36000 
36060 
36120 
36180 
36240 
36300 
36360 
36420 
36480 
36540 
36600 
36660 
36720 
36780 
36840 
36900 
36960 
37020 
37080 
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attttttaat 
tccatcttgt 
gtcattcttt 
cgtaggtttt 
cctactacat 
agcaggtaaa 
gggtcttttt 
attattatta 
gcagtggtgc 
attcagtctc 
tttagtaggg 
tgatccatcc 
tgtaat ccca 
catcctggct 
tgtggtggcg 
aacccaggag 
aaagagcgag 
aagtgctggg 
taataatatc 
gtgctt taaa 
ttaggtagaa 
tttactccta 
gccgttactg 
atttgccttc 
attaaagttt 
tctttaaagc 
gagcattttg 
tatgatatag 



gattagacta 
gaggttagca 
cagcattcat 
aaatatatat 
atgagtggct 
ccttatttat 
tatctaattt 
ttattattat 
aatctcggct 
ccgagtagct 
acggggtttc 
acctcagcct 
gcactttggg 
aacacggtga 
ggcacctgta 
gcggagcttg 
actctgtctc 
attacaggca 
tcatattttt 
aatgttggtt 
agatatattc 
tatggtcatt 
ctttcagatg 
ttcctttcta 
gttgcaataa 
ttttcagtgt 
taacatcttt 
gacattagaa 



cttcctaggt 
gcagttaact 
caactgtaat 
taataagtta 
ctaccatacg 
ttgtagccct 
gtaatatgaa 
tattcgagat 
caatgcaccc 
aggattacag 
accatgttgg 
cccaaagtgc 
aggccgaggc 
aaccccgtct 
gtcccagcta 
cagtgagccg 
aaaaaaataa 
tgagccacca 
gaccattgaa 
tttatttatc 
ataaacacat 
gtgtttgtgg 
agtaacaagg 
aatggaggtt 
aataattgat 
atatataaag 
caaaattaat 
gtattaatat 



ggttgtgttt 
atgactgatg 
agaaactttt 
aatctaccac 
ttggtggatg 
gtgacttggg 
gctattatta 
ggagtcttgc 
tccacctcct 
gggcctacca 
ccaggctggt 
tgggataggc 
gggtggatca 
ctagtaaaaa 
cttgggaggc 
agatcatgcc 
aataaaataa 
cgcccagctg 
ttattaaagg 
tgtttacctg 
acatatacat 
gatttttgcc 
tagtgttccc 
attactgtgt 
ggttctaatt 
tatatatcat 
aagagttggt 
caattagaga 



gtgtgttcat 
tccagtgata 
acttgtcttc 
tcaaaaaaag 
gaaatagtag 
gtaagttatt 
ttattattat 
tttgttgccc 
gggttcaaac 
ccacacctaa 
ctcaaactcc 
cgggcacggt 
cgaggtcagg 
aatacaaaaa 
tgaggcagga 
actgcactcc 
atagataaat 
aagctaatat 
aacctaaacc 
actgtcatca 
tttattcatt 
cgcacatagt 
tggcttctgt 
cagatataat 
ggtactttct 
acagaataaa 
ttttatgttt 
ctcatctttg 



cattagttgt 
ctttaattct 
tgtttgatgg 
tagaacaaaa 
atatttggta 
ttgcatgtct 
tattattatt 
aggctggagt 
tattctcgtg 
tttttgtatt 
tgacatcaag 
ggctaacgcc 
agattgagac 
actagccagg 
gaatggcatg 
aggctgggca 
aataataata 
tattagctaa 
atagtagtaa 
cctcctttgc 
ctttaattca 
taaaaatgca 
ggctgacagg 
taaatagtgt 
acgtgtttta 
tttgattgtg 
tgtttgatgt 
aatgtgactt 



37140 
37200 
37260 
37320 
37380 
37440 
37500 
37560 
37620 
37680 
37740 
37800 
37860 
37920 
37980 
38040 
38100 
38160 
38220 
38280 
38340 
38400 
38460 
38520 
38580 
38640 
38700 
38760 



27 



tgtactttct 
gaaactaggc 
cctcaaaaat 
aaaggtgaca 
aaggtgggcg 
taatctctac 
ctactcagga 
ccaagatggc 
aaaaaacgcc 
agcagatcac 
ccccgtctct 
ccagctactt 
tgagccgaga 
aaaaaaaaaa 
tgggaggttg 
accacaccac 
aacaaaccaa 
ttcccatata 
actatctgtc 
cccaagtgcc 
aatgaagttg 
aatgttaaat 
ttgccgacat 
aaggttcact 
agagaggcta 
ttttccatcc 
ccagcacata 
ttgtcttagg 
gtcacctctt 



tatttgtgtt 
atttaaagat 
gctattacct 
tcggctgggc 
gatcacctga 
taaaaataca 
ggctgagtca 
accactgcac 
gggcactgtg 
ttgaggtcag 
actaaaaata 
gggaggctga 
tcccgccact 
aaaaaaaaaa 
aggcatgaga 
tgcactctag 
accacctctt 
tttgtcacct 
ttctccctgt 
tacaacagtg 
tttataaccc 
gaagctatct 
caaaattaat 
atcactaaac 
catcctctca 
taattagata 
ataaacaaat 
tatatccact 
ttttcttttt 



agtagaggag 
ttaacgtttt 
atttatgaaa 
gcggtggctc 
ggtcaggagt 
aaaaattagc 
ggaggatggc 
tccagactgg 
gctcacacct 
gagttcgaga 
caaaaaaatt 
ggaaggagaa 
gcactccagc 
tagccgggca 
attgctcgag 
cctggatgac 
gtccactcct 
tctaataaac 
tggaatataa 
tctagcacat 
ttggcactat 
taaaaatata 
cttttcatct 
caaaaataaa 
aaccagtcag 
cctatctggc 
atgttactat 
cccagacctt 
tttgagacgg 



agaacaaaaa 
ggatatttta 
tatctttaaa 
acacctataa 
tcacgaccag 
caggctggtg 
ttgaacccag 
gtgacagaga 
gtaatcctag 
ccagctgggt 
agccgggcgt 
tggcgtgaac 
ctgggtgaca 
tggtggcggg 
cctgggaggc 
agagtgagac 
aatctccctt 
tgtgtaactt 
actctatgga 
gctagttacc 
acctgataca 
tatatatata 
attgcaggtt 
taatatggat 
tattaggaat 
accacatctc 
taatacacgt 
ctgagttatt 
agtcttgcct 



gaagatatgt 
aagttggtgt 
agtgtggagt 
tcccagcact 
tctggtaaca 
gtacgcacct 
aaaatggagg 
gagactccgt 
cactttggga 
gcctgggcaa 
agtggcgggc 
ccgggaggcg 
gagcgagact 
cacctctaat 
agaggttgca 
tttgtctcaa 
actgtgtttc 
attttttatg 
ggagggatcc 
gaataaatat 
gtgtagggac 
taacagccag 
ttacagttaa 
ggtcttttcc 
tcccaggatc 
tgaaaagctc 
ttgtacacaa 
gaaagacagg 
tatcacccag 



aatgtaatat 
ctgttttcac 
ggtagatggg 
ttgggaggcc 
tggtgaaacc 
gtaatcccag 
ttgcattgag 
ctcaaaaaaa 
gactgaggcg 
catagtgaaa 
gcctgtagtc 
gagcttgcag 
ccgtctcaaa 
ctcacctact 
gtaagcagag 
aacaaaaaca 
acttattttt 
tttattgtgt 
ttgtctatat 
ttgccaagtt 
ttaataaatg 
atacgccaat 
gctttgctct 
catttttaac 
attctctcca 
caatttttaa 
gtcattttta 
aatttgtgat 
gccagagtgc 



38820 
38880 
38940 
39000 
39060 
39120 
39180 
39240 
39300 
39360 
39420 
39480 
39540 
39600 
39660 
39720 
39780 
39840 
39900 
39960 
40020 
40080 
40140 
40200 
40260 
40320 
40380 
40440 
40500 
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aatggcgtga 
tcatcctcct 
ggatggtctc 
tggtccacag 
tgaagcagca 
gaactcatgc 
tgagctttta 
gtgcagtggc 
gaggtcagga 
agaaaaatta 
cagaagaatc 
cctccagcct 
aaaggaatga 
tatagactct 
gaaaagtgat 
ttatgattat 
tctctttatg 
ggatgggtat 
gacaaagctg 
ttagataaat 
gtaatcctag 
ccagcctgga 
gagtggtgat 
gcttgagctc 
gggtgacaga 
tatgtcagga 
aggggagggt 
gatcatggat 



tctcggctca 
gagtagctgg 
ctctttgaga 
agaagtaaga 
aagaggccaa 
ggagagggat 
ccgcctgtga 
tcatgcctat 
gttcgagacc 
gctgcgtgtg 
acttgaaccc 
gggcaacaaa 
aatactctga 
aagggtagaa 
agaggcttgc 
agttgaaggt 
gatctcataa 
tagaaatatt 
gttaaagaaa 
gaataaaagt 
cactttggga 
cagcgtactg 
gtgcacctgt 
aggaggtcga 
gtgagaccct 
aaatgtgatg 
acaggttgag 
ggcctggttt 



ctgcaacctc 
gattacaggt 
agttgccatg 
gccaatacaa 
tgtgttccag 
tgtcacaaca 
aatggggagc 
aatcccagct 
agcctggcca 
gtggccatgc 
gggaggcgga 
gcgagactcc 
cttaaatttt 
ccagggagag 
accagggtag 
cgaacaacta 
tacagtagga 
ttgcaagggt 
gaaaagaaag 
tatgaaagaa 
ggttgagtcg 
agaccccatg 
agtcctagct 
gactgcagtg 
gtctcaaaga 
acatgaacaa 
acatgaagtt 
gtcataggaa 



tgcctcccgg 
ggccctagtt 
agttagccat 
cgccttaaga 
caaggagatt 
cgcagggctg 
agttacaaag 
ctttgggagg 
acatggcgaa 
ttgtaatccc 
ggttgcagtg 
gtctcaaagc 
agaaggatca 
caattaggac 
tatcaggaga 
gatttgctga 
gaaacagaca 
gtaaaaggtg 
cttcctagaa 
tgtgagggct 
ggggtattgc 
tctaaattta 
acttacttgg 
agccatgagt 
aaatgaaaag 
atactcaaag 
gggattgccg 
ggagcttggg 



gttcaagcga 
tgggtttttg 
gatgatacct 
gaaagtacct 
atagtagaga 
taaaggccac 
atttttaaga 
ctgaggtggg 
accttgtctt 
aaatactcgg 
agccgaggtt 
cagaaaaaaa 
ctctggttgc 
actgttacag 
aatggtaaca 
tagattggat 
cataaatagt 
ctaagaaagc 
gtgaaaccta 
gggcacagtg 
ttgagtccag 
aaaaaaaaaa 
gaggctgagg 
acaccactgc 
aaaaagaatt 
gcaagaaaag 
aggcgtaaga 
ctcttatctg 



ttatcctgcc 
gttttaaaca 
gggacaataa 
ggtatgtttg 
tgaagtcaga 
catgaggact 
aagcagccca 
tggatcacct 
tactaaaaat 
gaggctgagg 
gcgccactgc 
gatttttagg 
tatgttgcat 
taatctagga 
agtggttgga 
aaaatggtcc 
catcactgca 
tttcttgtgt 
aaatatacac 
gctcatacct 
gagtttgaaa 
aaatagccag 
caggaggatt 
actctagtct 
tgagaaaaga 
catggtgagt 
ggtataggca 
tgggcaatgg 



40560 
40620 
40680 
40740 
40800 
40860 
40920 
40980 
41040 
41100 
41160 
41220 
41280 
41340 
41400 
41460 
41520 
41580 
41640 
41700 
41760 
41820 
41880 
41940 
42000 
42060 
42120 
42180 



29 



gaagccacta 
gatcactcag 
taatttaaat 
tttaacagaa 
ggaggccgag 
gcaaaacctt 
aatcccagct 
gcaatgagct 
ccaaaaaaaa 
gtgtctatga 
tgtactgtta 
aaaagaaatc 
ttttattaca 
taattcacat 
ttttactgag 
aaaatcagat 
agccagttct 
tgctatttga 
tttgtcaaac 
tgtattcatt 
cacttgggga 
tgggcttcag 
gatgtggggg 
tagacgtatg 
gccctggcta 
tttttagagg 
gtattgtata 
ttttctgcct 
tttcattctg 



aagggtttta 
atgactgtgg 
cattgattaa 
gtgaattggg 
gcgggcagat 
gtatctacta 
attcgggagg 
gaggtcgcac 
gaaaaaatat 
ctcctcctcc 
cactgtatgt 
ctatagcttt 
taatgaatgg 
ttatgtacgg 
acaaagggtt 
gaccaacctg 
aggatacaaa 
cctgtcccaa 
tttagaaatg 
gacttgctgg 
agagagtgca 
ttgctagact 
taaggttgaa 
tcacaaaatt 
cccatttccc 
taatcttttt 
tatatacagt 
gacatatgtt 
tgctacttta 



agtagaagag 
gggttggatg 
tctgtacaat 
tctgggcgcg 
cacctgaggt 
aaattacaaa 
ctgaggcagg 
cactgcgctc 
tggagcagtt 
aagaaaaaaa 
gtctgtttct 
ttattcctag 
ttctgttaac 
atgtctacat 
tctgtctcag 
ttagctcaaa 
agccatgcag 
aggcatgtgt 
aaagtttaag 
tacagaagaa 
gcagtagttt 
taagagaccc 
atactccctt 
ttaacaagtt 
ctccctagag 
gtatgcaact 
tactgcagta 
aatgtggcca 
aatgctgttc 



tgttatatgg 
tgaggaggta 
cctagtcatt 
gtggctcacg 
caggagtttg 
aattagctgg 
agactctctt 
cagcctgggc 
tcacagatgc 
aaatgaattg 
acacatataa 
ctataaaaac 
tttttgttaa 
ttacaaatca 
catggtcatt 
aaaaaaaaaa 
tactttgtgt 
ggttgtaccg 
agagttaata 
aagaatcaat 
agagtgtcag 
agatcttggg 
ttaattgatt 
tgagtgtgaa 
gcggctgtta 
gtgcacatgt 
tgcttatttg 
aatagggcat 
tccttccctg 



taaggttttc 
aagcagcaaa 
ccaaaaagaa 
cctgtaatcc 
agacaagtct 
gcatgggggc 
gaacccataa 
gacagagtga 
tgtttactgt 
gagcaggttc 
atctgaattc 
taagaatata 
aatttcattg 
gtgtattttt 
taaagagttt 
cctccaaggt 
tttgtgccaa 
taaaccaagc 
tataggtgct 
tatgattcag 
ggatcaaact 
aggggttttt 
aatataaaaa 
aagcatccat 
ttatcagttt 
atgtttacac 
actttgcttt 
ttcccagaca 
aaatgtcctt 



ctctccagta 
gttactgctt 
acattagtcc 
cagtactttg 
ggccaacatg 
acccacctgt 
ggcagaggtt 
gactccgtct 
tatactgtat 
acagaagcaa 
tgtgtacacg 
atttctttct 
aggggagtat 
tgatttatgc 
atcattgaga 
atattgtatc 
aaagggtagc 
atggtacctg 
gcatttttta 
cacaatactc 
gctaccttct 
gttgttattt 
ataggtaatg 
cccattctat 
ctctgggtcc 
aaatggtagg 
attgctttta 
atccttatag 
tttttttttc 



42240 
42300 
42360 
42420 
42480 
42540 
42600 
42660 
42720 
42780 
42840 
42900 
42960 
43020 
43080 
43140 
43200 
43260 
43320 
43380 
43440 
43500 
43560 
43620 
43680 
43740 
43800 
43860 
43920 



30 



ttttcttttt 
cacaatctca 
ttcccgagta 
tgttttagta 
ctgatctgcc 
ggctgt cctt 
aagctt tttc 
ttataagttg 
atctccgtag 
ggcgggagga 
cgtctctaca 
caggagactg 
atggtgccac 
atggtatttg 
atctaactac 
tgttaatttc 
tatataaatg 
atttcacatg 
ttttaccttc 
acggacgtct 
gggttt aggt 
acctgataat 
ctgtgttgaa 
tagtcattct 
aatatgaagg 
tgtgttttta 
attttaattt 
aatgtcttat 



ttttttttga 
gctcactgca 
gttgggacta 
gagacagggt 
tgcctcagcc 
tcttttaatc 
agtgatcttt 
ctaattatat 
tgccaggaac 
tcgcttaagg 
aaaaataaat 
aggcagaagg 
tgtacttcag 
ttttggacgt 
ctcaggttac 
tgttattttc 
caaaaacttt 
cacattttat 
tctgttgcat 
ataatgacag 
taactaacat 
gttgatttga 
aatatagtac 
aaacttgctt 
tatacattcc 
tatggatagc 
tttatgtatc 
agtacttgat 



gacagagtct 
acctccacct 
caggcgcgtg 
ttcactgtgt 
tcccaaagtg 
ttttcaaatt 
cttttgtacc 
ttaataaata 
tgtggctcat 
ccaggagctt 
tagctgggca 
ataacttgag 
cctgggtgac 
attagatatt 
ctaacctgaa 
gtgacttcat 
ttatcatgta 
ttgttcatta 
agagaagatg 
tactaacttg 
aaaataataa 
ttttataatg 
ctttatcatc 
tatttgcata 
cgtaagtgat 
tatactgatt 
aatttttatt 
agaagaaatt 



cgctctgtca 
ccccggttta 
ccaccacgcc 
tagccaggat 
ctgggattac 
ctagccagtt 
atgcagtctt 
cctaaatggt 
gtctgtaatc 
gagcccagcc 
tagtggtgtc 
ctcagggagt 
agagcaagac 
cagtaaatat 
taaggcctat 
ttttactaat 
acaatctggt 
tcttttttct 
caaccagttt 
gcatgccgca 
agcttgcatg 
gtaggtttaa 
atagtatata 
ttttgcaacc 
taaaataatc 
tttttcatga 
gcatatcaat 
tgaaacttca 



cccaggctgg 
agccattccc 
cggatacttt 
cgcctcaatc 
agacgtgagc 
catagcccac 
ttaaaaaatc 
agaaattgat 
cctgcacttt 
tgtggaacat 
tgtatgtagt 
tgaggccaca 
caagaccgat 
tttctggtga 
ttttaacata 
tatattttaa 
aacaccttga 
ttttgtttat 
tgccattttc 
atggacatct 
caaagtaaga 
ttgttcatgt 
aacatgcaac 
tctggttttt 
tttttttctg 
aatagtgttt 
tttcatttat 
catagtgagg 



agtgcagtgg 
ctgcctcagc 
tttgtttgtt 
tcctgacctc 
cactgcgcct 
ttgctctgta 
tacagtttta 
tcatttttta 
gggaggctaa 
attgagaccc 
gtcaactact 
gtgagccatg 
aattatctcc 
tgatagtgat 
gccattcact 
aagataattt 
gtaatttgtc 
tttttctgtt 
caagtcacaa 
ccagtcaggt 
gtcttactta 
tttcacaggg 
aagtcaggta 
cagattataa 
ttgtggcttt 
tctaagacac 
agtatatgtg 
agaaacatta 



43980 
44040 
44100 
44160 
44220 
44280 
44340 
44400 
44460 
44520 
44580 
44640 
44700 
44760 
44820 
44880 
44940 
45000 
45060 
45120 
45180 
45240 
45300 
45360 
45420 
45480 
45540 
45600 
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cagtattatt 
agaagattaa 
ttctctgaca 
atgtacctgt 
tcaaggtaat 
cagtattttc 
cctgcagtga 
ttccctaccc 
ctttttaaaa 
tcttggacac 
ggggtgtaga 
gggattggtg 
tccatagtag 
acatcctcac 
gtaagatacc 
tttttcatat 
ttttagcaat 
ataccagaac 
taattatata 
ggtttgttac 
tacattaggt 
cattgtgtga 
agtgagaaca 
tccagcttca 
tattccatgg 
cattgattcc 
tctttatagc 
tcaaatggta 
tgaactagtt 



tgctatggat 
atgtctttct 
tctgagctag 
tcatggggta 
tatcatttcc 
cttctagcat 
tgtagaacaa 
ttttcagcct 
ataatattct 
ctaggttgat 
tgtctgttcg 
ggtcatatgg 
ctgtactagt 
cagcatttgg 
ttgttgtggt 
ctttgttttt 
agtttcatgt 
aaccttaggt 
tatattttta 
atatgtatac 
atatctccta 
tgttccccac 
tacggtgttt 
tccatgtccc 
tgtatatgtg 
aagtctttgc 
agcatgattt 
tttctagttc 
tacgttccca 



taactctatc 
gttttgtttt 
ctttttgttt 
cacagtgatg 
atcatcccaa 
ttgaaatgat 
tggagcttac 
ctagtatcct 
gtgtgtatat 
tctgtatctt 
atatgatgat 
tagttctgtt 
ttagattccc 
tacttttttt 
tttgatttgc 
gacggtccag 
gcagggctca 
taggctataa 
attacacttt 
aagtgccatg 
atgctacccc 
cctgtgtcca 
ggttttctgt 
tacaaaggac 
ccatattttc 
tattgtgaat 
ataatccttt 
tagatccttg 
tcaacagtgt 



cttcataaac 
ctttgggtag 
aattaaaact 
ttttgataca 
acatttatca 
ataatatatt 
tctttctatc 
ctgttctact 
ataccacatt 
ggctgttgtg 
tttctttcct 
tgtagttttt 
aaaagtagcg 
gtctttttga 
atttccctca 
ctagttttat 
aattatattt 
aacaactgcc 
acgttctagg 
ttggtgtgcc 
tcccctctcc 
agtgttctca 
ccttgcgata 
atgaactcat 
ttagtccagt 
agtgccacag 
gggtatatac 
aggaatcgcc 
aaaagtgttc 



ttttggatat 
gggactctcc 
ttttttttag 
tatagtatat 
ttccttgtgt 
atattgttac 
tagctatacc 
ttttatttct 
ttctttattc 
aatagtgctg 
ttggataaat 
gaggaacctc 
tataagagtt 
taatagccat 
taattaacga 
tggttacttt 
aatattactt 
ttctttttct 
gtgcatgtgc 
gcacccatta 
cctcacccca 
ttgttcagtt 
gtttgctcag 
cctttttgat 
ctatcattga 
taaacataca 
ccagtaatgg 
acactgtctt 
ctgtttctcc 



caacttctac 
atttcttagg 
ttgacagata 
ggtgattaga 
tggaaacatt 
ctataatcat 
tttgtatcct 
atgagattaa 
attcgtctgt 
cagtaaacat 
tcccaatagt 
catactcttc 
cactttctac 
cctaactggg 
tgttgagcat 
tttttttttt 
ttttcaaatt 
tttggcataa 
acaacatgca 
acttgtcatt 
cgacaggccc 
cccatctgtg 
aatgatggtt 
ggctgcttag 
tggacatttg 
tgtgcatgtg 
gatggctggg 
ccacaatggt 
acatcctctc 



45660 
45720 
45780 
45840 
45900 
45960 
46020 
46080 
46140 
46200 
46260 
46320 
46380 
46440 
46500 
46560 
46620 
46680 
46740 
46800 
46860 
46920 
46980 
47040 
47100 
47160 
47220 
47280 
47340 
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cagcacctgt 
ctcattgtgg 
tgtcttttgg 
tttttgatgg 
ggatatcagc 
cctgttcact 
ccatttgtca 
gcccatgcca 
ttttatagtt 
aggtgtaagg 
ccatttatta 
cacatggttg 
atctctgttt 
gaagtcaggt 
gcaggctctt 
agtcatttgt 
cattttcatg 
gtcctctttt 
ccttgtaagt 
actcatgatt 
ttgattttgt 
gagatgatgg 
ttcttttcct 
ttccaacact 
caaagggaat 
aatagctctt 
catgaagggc 
gtttttgtct 



tgtttcctga 
ttttgatttg 
ctgcataaat 
ggttgtttga 
cctttgtcag 
ctgatggtag 
attttggctt 
gtgcccatgc 
ttagaactaa 
aagggatcca 
aataggaaat 
tagatgtgtg 
tggtatcagt 
agcacgatgc 
ttttggttcc 
agcttgatgg 
atattgattc 
atttcattga 
tggattccta 
tggctgtttg 
atcctgagac 
ggttttctaa 
aattggatac 
atgttgaaca 
gcttccagtt 
attattttga 
tgttgaattt 
ttggttctgt 



gtttttaatg 
catttctctg 
gtcttctttt 
tttttttctt 
atgggtagat 
tttcttttgc 
ttgttgccat 
cagtgtcctg 
catttaagtc 
gtttcagctt 
cctttcccca 
gtattatttc 
accatgctgt 
ctccagcttt 
atatgaactt 
ggatggcatt 
ttcctatcca 
gcagtggttt 
ggtattttat 
tctgttattg 
tttgctgaag 
atatacaatc 
cctttatttc 
ggagtggtga 
tttgcccatt 
gatacatccc 
tgtcaaaggc 
ttatatgatg 



atcgccattc 
atggccagtg 
gagaagcatc 
gtaaattttt 
tgcaaaaatt 
tgtgcagaaa 
tgcttttggt 
aatggtattg 
tttaatccat 
tgtacgtttg 
tttattgttt 
tgagggctct 
tttgattact 
gttcttttgg 
taaagtagtt 
gaatctataa 
taagcatgga 
gtagttctcc 
tctctttgaa 
gtgtatagga 
ttgcttatca 
atgtcatctc 
tttctcctgc 
gagagggcat 
cagtatgata 
atcaatacct 
cttttctgca 
gattacgttt 



taactggtgt 
atgatgagca 
tgttcatata 
ttaagttctt 
ttctcccatt 
ctctttagtt 
gttttaggca 
cctagatttt 
cttgaattaa 
gctagccagt 
ttgtcaggtt 
gttctgttcc 
gtaccttcgt 
cttaggattg 
ttttccaatt 
attaccttgg 
atgttcttcc 
ttgaagaggt 
gcaattgtga 
atgcttgtga 
gcttaaggag 
caaacaggga 
ctgattgccc 
ccctgtcttg 
ttggctgtgg 
agtttattga 
tctattgaga 
attgatttgc 



gagacggtat 
ttttttcatg 
ctttgcccac 
tgtagattct 
ctataggttg 
taattagatc 
tgaagtcctt 
cttctagggt 
tttttgtata 
tttcccagca 
tgtcaaagat 
attggtctat 
agtatagttt 
tcttggcaat 
ctgtgaagaa 
gcagtgtggc 
atttgtctgt 
ccttcacatc 
atgggagttc 
tttttgcaca 
attttgggct 
caatttgact 
tggccagaac 
tgctagcttt 
gtttgtcata 
gagtttttag 
taatcatgtg 
atatgttgaa 



47400 
47460 
47520 
47580 
47640 
47700 
47760 
47820 
47880 
47940 
48000 
48060 
48120 
48180 
48240 
48300 
48360 
48420 
48480 
48540 
48600 
48660 
48720 
48780 
48840 
48900 
48960 
49020 
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ccagccttgc 
ctgctggatt 
tattggtcta 
gcctcataaa 
aggaatggta 
cctggacttt 
ggtctattca 
gatttatcca 
ctgatggtag 
gcatctattt 
ttgttgatct 
gtctctatct 
gaaggtgttt 
gatctttcct 
ttaaatgtgt 
atctttattt 
agtttccatg 
gcactgtggc 
gctttagttc 
gtatattctg 
agagctgagt 
ttgaagtctc 
tgctttatga 
tcttgttgaa 
gttggtttaa 
catttgcttg 
tgagatgggt 
gtctgtgtct 
ttgaatttga 



atcccaggga 
cggtttgcca 
aaattctctt 
atgagttagg 
ccagctcctc 
ttttggttag 
gggattcagc 
tttcttctag 
tttgcatttc 
gattcttctc 
tttcaaaaaa 
ccttcatttc 
gctcttgctt 
gttttctctt 
cccaaagatt 
ctgccttcat 
tagttgtgtg 
ctgagagaca 
caactatgtg 
ttgatttggg 
tcagttcctg 
ccagtattat 
atctgggtgc 
ttgatccctt 
agtctgtttt 
gtagatcttc 
ttcctgaata 
tttaattgga 
tcctgtcatt 



tgaagccaac 
gtattttatt 
tgttgtgtct 
gaggattccc 
tttgtacctc 
taggctatta 
ttcttcctgg 
attttctagt 
tgtggaatcg 
tcttttcttc 
ccagctcctg 
tgctctgatc 
ctctagttct 
gtgggcattt 
ctgatatgtt 
tttcttatat 
gttttgcgtg 
gtttgttgta 
gtcaattttg 
gtttagagtt 
gatctgtctt 
tgtgtgggag 
tcctgtattg 
tagcattata 
atcagagagt 
ctccatccct 
cagcacactg 
gcatttagcc 
atgatgttag 



ttgattgcgt 
gaggattttt 
ctgccaggct 
tctttttcta 
tggtagaatt 
attattgcct 
tttagccttg 
ttatttgagt 
gtggtgatat 
tttattagtc 
gattcattga 
ttagttattt 
tttaatggtg 
agtgctataa 
gtgtctttgt 
acccagtagt 
agtttcttaa 
atttctgttc 
gaataggtgt 
ctgtagatgt 
gttgatctgt 
tctaagtctc 
ggtgcatata 
tgatggcctt 
tggattgcaa 
ttattttgag 
atgggtcttg 
catttccatt 
ctggttattt 



tggataagct 
gcgttgatgt 
ttggtatcag 
ttgattggaa 
cagctgtgaa 
caatttcaga 
ggaggctgta 
agaggtgttt 
cccctttatc 
ttggtagcag 
ttttttgaaa 
cttgccttct 
atgttagggt 
atttccctct 
tctcgttggt 
cattcaggag 
tcctgagttc 
ttttacattt 
ggtgtggtgc 
ctattaggtc 
ctaatattga 
tttgtaggtc 
tatttaggat 
ctttgtctct 
accctgcttt 
cctatgtgtg 
actcgttatc 
taaggttaat 
tgctcgttag 



ttttgatgtg 
tcatcaggga 
gatgatgctg 
tcatttcaga 
tccatctggt 
gcctgttatt 
tgtgtccagg 
atagtattct 
atttcttatt 
tctatcagtt 
ggttttttgt 
gctagctttt 
gtcaatttta 
acacactact 
ttcaaagaac 
caggttgttc 
tagtttgatt 
gctgaggagt 
tgagaagaat 
cacttggtgc 
cagtggggtg 
tctagggact 
agttagctct 
tttgatcttt 
ttttgttttc 
tctctgcacg 
caatttgcca 
attgttatgt 
ttgatgcagt 



49080 
49140 
49200 
49260 
49320 
49380 
49440 
49500 
49560 
49620 
49680 
49740 
49800 
49860 
49920 
49980 
50040 
50100 
50160 
50220 
50280 
50340 
50400 
50460 
50520 
50580 
50640 
50700 
50760 
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ttcttcctag 
tgttcatttc 
aaaatctgtc 
tagtttggct 
tggcccccac 
gggcttccct 
catttcaact 
tctttgtggc 
gaacttctcc 
actttcaggt 
gaggttctgt 
attcatttga 
gctcatgcat 
ggtcttctcc 
agcttctttg 
tcgtctgagg 
tgctggtgag 
agcttttctg 
atggtgacgt 
taacactcag 
ctgtttgcct 
tgttgctgtc 
gcgtcagtca 
cacttgagga 
ctctctttag 
ttcagctatc 
gtggtggact 
caatggcaga 



cctcgatggt 
catgttcagt 
agcatttgct 
ggatatgaaa 
tctcttctgg 
ttgtgggtaa 
ttggtgaatc 
attctctgta 
tggataatat 
acacccgtca 
tcgtttcctt 
tcttcaatca 
gcatcacgta 
atgctgttta 
caatggtttc 
cctacttctg 
gagctgtgtt 
ctctggtttc 
acagatgggg 
gaccctcagc 
gggtatcacc 
tgatccttcc 
gcccgtatgg 
ggcagtctgt 
agctgtcaga 
ccatgtcccc 
ccacccagtt 
tgcccccgcc 



ctttacaatt 
gcttccttca 
tgtctttaaa 
ctctgggttg 
tttttagagt 
cccgaccttt 
tgacaattat 
tttcctgaat 
cctgcagagt 
gacatagatt 
ttactctttt 
ctgataccct 
gttttcgtgc 
ttctagttag 
gaacatcctc 
tcagcttgtc 
cctttggagg 
tccccatctt 
ttttggtgtg 
tgcaggtctg 
agcggaggct 
tctggaacct 
ggaggtgtct 
ccgttcgccg 
cagggacctt 
agaggtggag 
caagcttcct 
tccagcctct 



tggcatgttt 
ggagctcctg 
ggattttatt 
aaaattcttt 
ttctgccaag 
ctctctggct 
gtgtcttgga 
ttgaatgttg 
gttttccaac 
tggtcttttc 
ttctctaaac 
ttcttccact 
catggttttc 
ccatttgtct 
ctttagctcg 
aaagtctttc 
agaagaggcg 
tgtggcttta 
gatgtccttt 
ttggagtttg 
gcagaacagc 
tcgtctgaca 
cccagttagg 
atctcaaact 
taagtctgta 
tctacagagg 
agctgctttg 
ttgccgcctt 



ttgcagtggc 
taagcaggcc 
tctccttcac 
cctttaagaa 
agatcagctg 
gcccttaaca 
gttgctcttc 
gcctgccttg 
ttggttccat 
acatagtcct 
ttctcttctg 
tgatcgaatc 
agctccatca 
aatctttttt 
gagaactttg 
tctgtctagc 
ctctgaattt 
tctacctttg 
ctgtttgtta 
ctggaggtcc 
aaatgttgca 
ggggtaccca 
ctacttgggg 
ccatgctggg 
gaagttactg 
cagtcaggcc 
tttacccact 
gcagttcgat 



ttgtaccggt 
tggtagttac 
ttatgaagct 
tgttgaatat 
ttagtctgat 
ttttttccta 
ttgagtagta 
ctaggttggg 
tctccccgtc 
atatttcttg 
gcttcatttc 
ggctactgaa 
ggccatttaa 
caaggttttt 
ttattaccca 
tttgttccgt 
tagaattttc 
gtctttgatg 
attttccttc 
actccagacc 
gaacgacaaa 
ggtatatgag 
atcagggacc 
agaactacta 
ctgccttttg 
tccttgagct 
caagcctcag 
ctcagactgc 



50820 
50880 
50940 
51000 
51060 
51120 
51180 
51240 
51300 
51360 
51420 
51480 
51540 
51600 
51660 
51720 
51780 
51840 
51900 
51960 
52020 
52080 
52140 
52200 
52260 
52320 
52380 
52440 
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tgtgct agca 
aatctcctgg 
tccctatttt 
acccct tgcg 
cacccactgt 
atcagctgtc 
ccatcttgga 
gagaagagat 
tcccttgtgg 
gtcagtggca 
gtatcgatat 
agaaggcact 
gttaagtttc 
attctccagc 
atttcatttc 
caaagcaatg 
tccctagtac 
cgactctttt 
caatcttggc 
gattctccca 
ctaatttttt 
actcct gacc 
gagccactgt 
aaatactgga 
aatgtaaaca 
atttcgcata 
catatt ggct 
agatcactca 
gttttacttg 



gtgagcgagg 
tgtgccgttt 
ccaggtacca 
cttcccaggt 
ctgacaatcc 
ttctgtgtcg 
accctgcctt 
agagtaaaag 
cttcacatct 
gtttccctct 
aagtattttg 
tactgaaaaa 
taaaatatgt 
cctgaagagt 
acatgctaga 
ttctcttcct 
aagttgaata 
ttttttgaga 
tcactgcaac 
cttcatcctc 
gtatttttag 
ttaagtgatc 
gcctagccaa 
gtattttgca 
ttgcaaaatg 
agggatactc 
ttatagctaa 
gacctattag 
gatgagttct 



ttccgttggc 
gctaagacca 
tatgtcacgg 
gaggcaatac 
ccagtgagat 
ctcacactgg 
cttcattcat 
gatttctata 
cccctaagga 
tcccattcct 
cctgtttctt 
aaaaaaacaa 
atttgagatc 
tggtctgtct 
ccccatcatg 
gcaacctgtt 
tccctaatat 
tgaagtcttc 
ctccgcagcc 
ccaagtagct 
tagagatgag 
cacctgccct 
atatccaaaa 
ttttggattt 
caaaaaaatg 
aacccataat 
tttacacagc 
tagtttatta 
ttcacctctc 



atgggaccct 
ttggaaaagc 
cttcccttgg 
cccgcccttc 
gaacccagtt 
gagctgcaga 
atgtaataca 
gacagaagaa 
cttcttatgt 
tcctctgtct 
ttagcgtggc 
agaaatgtaa 
ccagtaattc 
ttcctttcct 
ttttcctgct 
tgaaaaaatt 
gaaatctgaa 
ctctgtcact 
tcccctaacc 
gggactacag 
gtttcaccat 
ggcctcccaa 
ctttttgagc 
ttgggttaag 
taaaaaccct 
cttttctttt 
attgatgttt 
gtgtcactca 
ttgggaaata 



ctgagccagg 
ccagtattag 
ctaggagagg 
agctcacact 
cctcagttgg 
ctggagctgt 
aaacttctaa 
acagttggtc 
tggtttggtc 
gatttaaaat 
tgtgaagggc 
gagtccatca 
tactaggata 
tgttatcttg 
gcattcccca 
gaactttgtc 
atccaaaatg 
caggctggag 
ccatccctgt 
gcacctgcca 
gttggtcaag 
agtgttggga 
gctgacatga 
gatgctgaac 
aagcagttct 
ctgttttctg 
aatacacagt 
cttctggaac 
gtcataccaa 



catgggatat 
ggtgggagtc 
gaattcccca 
atgtgggctg 
aaatgcagaa 
tcctatttgg 
ggttttagtg 
atcaactctt 
ttacagtata 
gctgtttcaa 
tgacattttc 
catataaata 
aatagcaaaa 
attctctttt 
ctccaccctc 
ttatttcata 
ctctaaaatt 
tgcggtggcg 
gggttcaagc 
ccacacccaa 
ctagtctgag 
ttacagtcat 
tgctcaaagg 
cagtaagtat 
ggtcccaagc 
gttggaaggg 
gagtccaagt 
attctgtgat 
agtctgctta 



52500 
52560 
52620 
52680 
52740 
52800 
52860 
52920 
52980 
53040 
53100 
53160 
53220 
53280 
53340 
53400 
53460 
53520 
53580 
53640 
53700 
53760 
53820 
53880 
53940 
54000 
54060 
54120 
54180 
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ttactacaat 
gcaaagaaaa 
gagtgggccc 
ttaaagaatt 
actagacaat 
acagacaaca 
tgcaggtctg 
tggtttcata 
aaacattttg 
atcaaattgt 
tcccaggctg 
taatttttgt 
tcctgggctc 
gctactgtgc 
tcactctgtc 
tcccgggttc 
gccaccatac 
aggatgatct 
tacaggcctg 
agattatacc 
ctggagtgca 
tttcctgtct 
atttttgtgt 
ctgacttcag 
caccacgcct 
ttctgtttct 
tttaactctt 
ctataatgat 



gtgttatcta 
agtaaagtat 
agatattgga 
gaaagaaaat 
aaaaataata 
gcctcaacag 
tagaaattac 
tacaatgctt 
tgctttgatt 
atactttatt 
gagtacagct 
attttttgta 
aatcatttct 
ctggctggta 
gcccagggtg 
aacccattct 
ccagctgatt 
cgatctcctg 
agccaccatg 
tcagtaaatt 
gtggcacgat 
cagcctcccc 
ttttagtaaa 
gtgatccacc 
ggccttaatt 
ttcctctcac 
tttgctattt 
tccttttttt 



ttatacattg 
aacccttact 
tttagcagac 
atggtatcag 
ttctaaagtt 
cagattagag 
taaattgatg 
taactctttt 
ttaatgggtg 
tatatattta 
gaaattagtt 
gagatggggt 
gccgcctcag 
tagctttttt 
gagtgcagtg 
cctgcctcag 
tttgtattct 
acctcgatcc 
cctggctgtt 
tttatttagt 
ttcagctcac 
ggtagctgga 
gatggggttt 
cgcctcggcc 
tttaaatact 
gtagcaacct 
ttttctctta 
tttctatcag 



tccaattttc 
caagaaaaaa 
aaagacttca 
ttaaacagga 
gaaaagtgta 
atagcaaaag 
actctcacgt 
tgctattttt 
tatcacagtt 
tttttttttg 
gggactacag 
ttccccatgt 
actcccagag 
tttttttttt 
gcgcgatctg 
cctcctgagt 
tagtagaaac 
acctgccttg 
ggtatacttt 
ttttgagact 
tgcaacctct 
attagaggtg 
caccatgttg 
tctcaaagtg 
gtaaggctta 
tcacccctag 
ctttctacta 
tttttgataa 



tactcaaaat 
aagcaatcag 
aagcagctat 
aatctaagta 
gttactgaaa 
aaagattcag 
agcaactttc 
tctcttactt 
atatacatct 
agatagggtc 
gtgggcacca 
tggccagtct 
tgttgggatt 
tttttttttg 
ggctcactgc 
agctgggact 
agagtttcac 
gcctcccaca 
aaatggatgt 
gagttgctct 
gcctcctggg 
tgtgccacca 
gccaggctgg 
ctgggattac 
taaagaaaag 
tttggtttca 
tatttccaaa 
tcattgactc 



tactagacag 
tagaaactgt 
tataatatgt 
gatgatataa 
ttaaaaattt 
tgactttgaa 
acccgtagtt 
tctgtgatgg 
ttacaaactc 
tgactctttc 
ctgtgccagc 
ggtctcaaac 
ataggtgtga 
tgacagagtc 
aacctctgcc 
acagatgcgt 
catgttggcc 
gtgctggcat 
aattcattgc 
gttgcccagg 
tttaagcgat 
tgatcagatt 
tctcgaactc 
aggtgtgagc 
aatattcccc 
tatacagtgt 
tacaatgctt 
cttatggtca 



54240 
54300 
54360 
54420 
54480 
54540 
54600 
54660 
54720 
54780 
54840 
54900 
54960 
55020 
55080 
55140 
55200 
55260 
55320 
55380 
55440 
55500 
55560 
55620 
55680 
55740 
55800 
55860 
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aagaagactt 
tttagttttg 
tctgtatata 
tacatatata 
tacatatata 
tacatatata 
agttatatca 
cagttatata 
ttgttttttt 
gtggtgctat 
cagtctcccg 
tctttagtag 
tgatccaccc 
tccctataca 
tatgtatatg 
tatagtatat 
tatacactat 
atagtatata 
atatagtata 
gtatactata 
gtatactata 
tatacagtat 
agagtatata 
gtgtgtatag 
actatatcgt 
tacagtatac 
agtatatata 
gtgtatagcg 
tgtatagtgt 



aattcccttc 
atataactaa 
gttatatatg 
gttacatatg 
gttacatatg 
gttacatatg 
aaataactat 
tgtatttgta 
tgtttttttt 
cttggcttac 
agtacctggg 
aaatggggtt 
gcctcggcct 
tatatagtta 
taactatata 
atatactata 
atactatata 
tagtatatat 
catatagtac 
tatagtatag 
tagtatatag 
actatatcgt 
tacagtatac 
agtatatata 
gtgtatagag 
tatatagtgt 
cagtatacta 
tatatataca 
gtatagcgta 



tgtcactctt 
aaataactat 
tatatgcaca 
tatatgcata 
tatatgcata 
tatatgtata 
gtatatatag 
tatgtgtgta 
tggagatgga 
cgcaacctct 
attacaggca 
tcaccatgtt 
cccaaagtgc 
tatataccta 
tatagtatat 
tagtgtgtat 
gtgtatatat 
agtgtatata 
actgtatagt 
tatacatagt 
agtatatata 
gtgtatagag 
tatatcgtgt 
cagtatacta 
tatatataca 
gtatagagta 
tatagtgtgt 
gtatactata 
tatatacagt 



catatattaa 
atgtatatat 
tacatatata 
tacatatata 
cacatatata 
tgtatataca 
ctgtaaatgt 
tacatatata 
gtcttgccct 
gcttcccagg 
cgtggcacca 
ggccaggctg 
tgggattaca 
tagttatttt 
atagtatata 
atatagtata 
agtatatgta 
tactgtatat 
atatatagta 
atactatata 
cagtatacta 
tatatataca 
gtatagagta 
tatcgtgtgt 
gtatactata 
tatatacagt 
atagcgtata 
tagtgtgtat 
atactgtata 



tataactaat 
aactatatat 
gttacatatg 
gttacatatc 
gttacatatg 
tatatagtta 
atatataaac 
gttttttttg 
gtcccccagg 
ttcaagcaat 
cgccaggcta 
ttctcaaact 
ggcgtgagca 
tagttatatc 
tatactatat 
tatatagtgt 
gtatatatag 
ataggtgtac 
tagtatatat 
gtatatagag 
tatcgtgtgt 
gtatactata 
tatatacagt 
atagagtata 
tagtgtgtat 
atactatata 
tatacagtat 
agcgtatata 
gtgtgtatag 



atatatattg 
gtataagcta 
tatacacaca 
tatatgcata 
tatatgcata 
tatatgtatt 
tatatgtata 
tttttttttt 
ctggaatgca 
tctcctgctt 
attttttgta 
cctgacctcg 
ccgcgcctgg 
aaaataacta 
agtgtgtata 
atatatcgta 
tatatatagg 
atagtatact 
agtatacata 
tatatataca 
atagagtata 
tcgtgtgtat 
atactatatc 
tatacagtat 
agagtatata 
gtgtgtatag 
actatatagt 
tacagtatac 
cgtatatata 



55920 
55980 
56040 
56100 
56160 
56220 
56280 
56340 
56400 
56460 
56520 
56580 
56640 
56700 
56760 
56820 
56880 
56940 
57000 
57060 
57120 
57180 
57240 
57300 
57360 
57420 
57480 
57540 
57600 
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gtatactgta 
tatagtatac 
gtaatatcaa 
agccacgtat 
ttctgatctt 
taatttgctt 
aaactcttag 
tcacctcaac 
gtctgctgtc 
tatatatttt 
agtagtagta 
ctccatctcc 
ggcacacacc 
gttggccagg 
agcgctagga 
tgccctgttg 
cctgaatgtg 
tggagtgcaa 
ctcctgcctc 
tttttgtgtt 
tgaccttgtg 
cacgcccagc 
tgattttata 
tttcctatac 
aaagtaaaat 
catttgtctt 
aatattattt 
caacactgga 



tagcgtatag 
tgtatagtat 
aaaaccatag 
tatatattaa 
tttttttgtt 
tttaactaga 
tctttccagt 
attttggaca 
attctcatac 
taatatatat 
gtagtagtag 
cgggttcaag 
accatgcctg 
ctgatcttaa 
ttacaggcat 
ctctatcaag 
gctttttttt 
tggagtgatc 
agcctcctga 
tttagtagag 
atccgcccac 
ctaatgtgga 
atagttaact 
tagtttagag 
tgaggctttt 
cacagtaaac 
aggagtggtg 
caaataacct 



agtatatata 
atagagtata 
ctaagatttt 
tgatattttt 
ctctaacttt 
agacctcctt 
gtctgaaata 
tacctcattt 
ttgttccttg 
acatattttt 
tagtagtagt 
tgattctcct 
gctaattttt 
actcctgacc 
gaaccactgc 
tagtaggtct 
ttttttttcc 
ttggctcacc 
gtagctgaga 
acggggtttc 
cttggcctcc 
ttttgttttt 
tgtagtaaaa 
tctttagttt 
gtcttttttt 
attgtaaata 
agatttaaac 
tttcaactta 



tagtatactg 
tatagttaca 
tatgatttag 
tcccttttga 
atattatagg 
tagtagttct 
ccctcatttt 
ccaacaaaca 
tggaaatgtc 
atttatttaa 
agtagtagtt 
gcctcagcct 
gtattttgag 
tcagatggac 
gtctggcctt 
tatttattct 
taagacggag 
gcaatctctg 
ttacaggcgc 
accacattgg 
gaaagttctg 
ttttttaact 
gtaaactggt 
tcttaaaccg 
ttggccactc 
aacactgatg 
atttacaaac 
atcagaggtg 



tatagtgtat 
tatacatata 
caaaatattg 
gtttttcttc 
tatatactct 
tttagtgaat 
aatgtgacag 
tttgttactt 
tatatataaa 
aaatgtataa 
tttttttttt 
cctgagtagc 
tggagatggg 
cacctgcttt 
aagttattat 
agaacctcgt 
tctcactctg 
cctccttggt 
acgccaccac 
tcaggcttgt 
ggattacagg 
tgaattttat 
tgtctaaata 
tattaagaga 
taatttgctg 
tttgttgcgt 
ctgtaatata 
ttctataaat 



agagtatata 
gttattttta 
tttaccccaa 
cccacaggtt 
tttctctgtt 
atctgtggtt 
gtcacttttc 
ctgatagaga 
gtctcagaac 
ttaagttagt 
tagatggagc 
tgggattaca 
gtttcaccat 
ggcctcccaa 
tgactattga 
tttaaggtat 
ttgcccaggc 
tcaagcaatt 
acctggctaa 
ctcaaactcc 
cgtaagccac 
ttatacttcc 
aataagctga 
tttgattaca 
actgacttaa 
agtatgtaaa 
tatagtacag 
gttcatttgc 



57660 
57720 
57780 
57840 
57900 
57960 
58020 
58080 
58140 
58200 
58260 
58320 
58380 
58440 
58500 
58560 
58620 
58680 
58740 
58800 
58860 
58920 
58980 
59040 
59100 
59160 
59220 
59280 
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atatatacag 
taaaattgat 
tggattgcag 
gccaccacgc 
ggctggtctt 
gataacaggc 
gtgggaacag 
gctacttttt 
tttgagagct 
tcactctgtc 
tcctgagttc 
accaccacgt 
cagcctggtc 
gggattacag 
gttccccaaa 
ttttatttga 
ctcacttcaa 
ctgggattac 
tggctcactc 
aggagttcga 
ttttaattag 
tgggaggatg 
actgcactcc 
ggggaaaaaa 
ggtacatgtt 
taccttggtt 
tggttttact 
acacactagt 
ttcaggccac 



cccatataat 
tattattatt 
tggtgtgatc 
ccagctattt 
gaactccata 
gtgagctact 
actttcatag 
tttcctctgt 
ttatttgctt 
acccaggctg 
aagtgattct 
ctgactgatt 
ttgaactact 
gtgtgagcca 
atgaagtagt 
gatggaatct 
cctctgactc 
aggcacaaat 
ttgtaatccc 
gaccagccta 
tccggtctgg 
gatcacttga 
agcccaggcc 
cctaataacc 
ctcattgaaa 
ttacaaatgt 
ttttccttgt 
aattcactgc 
catagagcac 



acttagcata 
attttttttt 
agggctcact 
ttgtattttt 
cctcaggtga 
gcacccggcc 
tgtttatagt 
acatcctatc 
cataactttt 
gagtgcagta 
cctgcctcag 
ttttgtattt 
gacatcaggt 
tcgcgcccgg 
ttcttcttaa 
ctctctgtcg 
cgtggttcaa 
actgttttta 
agcactttgg 
ggcaacatag 
tggtgcacaa 
gcccaggaag 
acaggtttca 
atattcctat 
tcttattttg 
ttgcttgtct 
cagaaagtgg 
ctcgttcaaa 
ctagttacag 



tgtaaaagca 
tgagacagag 
gcaatcgggt 
agtggagatg 
tctgcccgcc 
taaaattgat 
tataatagtt 
cctaatgcct 
tttttttttt 
gtgcaatctc 
cctccgaagt 
ttagcagaga 
gatccgtcca 
cgcttcataa 
aaatactgtt 
ccaggctgga 
gcaattctcc 
ataagaaaat 
gaggccaagg 
tgagtgagac 
ctgatgtccc 
attaaggcca 
aaaaaagaga 
ctacctagtg 
aaatattttt 
ttatgttcag 
agctgttcca 
aacagttatg 
tggtttatcc 



catctttata 
tccctctcag 
agctggaact 
gggtttcacc 
tcagcctccc 
tagtttttaa 
tggaatcaaa 
ttactatatt 
tttttttttg 
ggcttactgc 
agctgggatt 
cgaggtttca 
cctcgatctc 
cctttttata 
ttttgttttt 
gtgcagtggc 
tgcctcagcc 
tagaataggg 
tgggaggatt 
cctgtttcta 
agctacttgg 
caatgagctg 
aaattagaat 
accaccccta 
gaattaaaaa 
ctacaatttt 
aaaagaaaag 
aaaactggat 
atggtttctg 



ctctttcaat 
tcacccaggc 
acagggatgt 
atgttggcca 
aaatagctgg 
gaccctcttg 
gtagtcgagt 
ttgtttctta 
agacagagtt 
aacctccgcc 
ataggccccc 
ccatgttggc 
ccaaagtgct 
ttgtctttct 
ttgttttttt 
acgatctcgg 
tcccaggtag 
acaggcacaa 
gcctgagccc 
taaaaaaaat 
gagtttgagg 
tgatcatgcc 
tatacgttag 
tgaagatcct 
aatatattat 
ctaatcacaa 
accccttaac 
ctgcaggcct 
gagtgaaaca 



59340 
59400 
59460 
59520 
59580 
59640 
59700 
59760 
59820 
59880 
59940 
60000 
60060 
60120 
60180 
60240 
60300 
60360 
60420 
60480 
60540 
60600 
60660 
60720 
60780 
60840 
60900 
60960 
61020 
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gggatctggt 
tctttttgca 
tttataatac 
tgagactctt 
gcctcccagg 
tgcaccacca 
tcaggctggt 
tgggattaca 
acttgctata 
agtctgtttt 
ctttgtaaat 
taaaatttgt 
taatgccaaa 
tttatggatt 
ttaacttatt 
ctttaagggt 
tcgtaagaaa 
aatgaatgaa 
atattttaat 
ggtttacata 
actgatttaa 
gcttcatctt 
ttagtggaac 
tgcaagaaat 
tttctataat 
gctatactaa 
aaaacattaa 
taattt ccaa 



cctgctccta 
aatagaaaaa 
tttagagaat 
gttttccagg 
ttcaagcgat 
cgtcgagcta 
ctcaaactcc 
ggcgtgagcc 
aaactttttt 
atataatgct 
gttttatacc 
ttttattttg 
tatagttctc 
tttataccct 
tatgaaaagt 
actccgaaaa 
aaagacttga 
attgtggaca 
tttacttata 
cttaattttt 
ctatagttta 
gtaataactg 
agctgttaaa 
tgttattctt 
accatctgtt 
aataattaat 
caattgatgt 
tccttttata 



ccactcataa 
tttttaagat 
ggataagttt 
ctggagtgca 
tctcctgcct 
attttgtatt 
cgacctcagg 
accgtgccca 
tggacctcat 
gtatatttca 
agtaacatac 
taaaaactaa 
tagtgaatac 
ttttcctatt 
gtaaatgtta 
caaataggac 
agaattttag 
agtaagtttt 
aaacatgtca 
atgatagttt 
acagttaaca 
ggccctgttt 
tttgatgata 
ccttctctga 
actgaatcca 
tcatataagg 
tgagaagtta 
tttttaggaa 



ggtattctgg 
actattcctg 
ccataaagtt 
atggcgcgat 
cagcctccca 
tttagtagag 
tgatccgcct 
aagttaattt 
tagacattta 
ttttaaattt 
ctagagtttt 
atctctggat 
agttttacct 
tttaaagctt 
ggttgtattt 
aaataaacct 
gaatgtggac 
gccatctaaa 
ggagtgaaat 
tcaattataa 
ttaaaaataa 
gtatcgtaga 
tagctggtca 
ggcctgaggt 
tagtagtagt 
taacaataga 
ttatagaata 
aagtgggtag 



gacagtaact 
cttaagttga 
aaattttttt 
ctcggctcat 
agtagctggg 
acggggtttc 
gcctcagcct 
ttttttaaat 
ataccaagtt 
atttttaaaa 
gtcatgattc 
aaaatcctct 
tcaggtaaat 
gaattctgtg 
tcatattaaa 
tctaccccta 
agcaaccttg 
tgttttattt 
agataataaa 
atgtagaaaa 
ctatatgtca 
actaactgag 
agacttggca 
aagaacttta 
agtagtaaag 
tttaatgttt 
ggaaaatgga 
tatgatataa 



ttaattgctg 
tcataagtac 
tttttttttt 
tgcaacctcc 
attacaggca 
tccatgttgg 
cccaaagtac 
ccacagggca 
ttgcttaccc 
ttatttcaat 
taacaagggt 
ataactgact 
aaatatacaa 
aactttaagg 
attttgtatc 
caactgctac 
ctaaccttat 
tatagttttt 
taccttgtct 
ccattgcttt 
tagggcttag 
gtcttgtttc 
aaacaagcat 
tattatcatt 
aaatatttga 
taaaaaagat 
tatgagttcc 
ttttgttttg 



61080 
61140 
61200 
61260 
61320 
61380 
61440 
61500 
61560 
61620 
61680 
61740 
61800 
61860 
61920 
61980 
62040 
62100 
62160 
62220 
62280 
62340 
62400 
62460 
62520 
62580 
62640 
62700 
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ttttggagac 


ggagtctcac 


tcattgtgtc 


gcccaggctg 


gagtgcagtg 


gcatgatctt 


62760 


agctcactgc 


aacctctgcg 


tcccagaatt 


caagtgattc 


tcctgcctca 


gcctcccaag 


62820 


tagctgggat 


tacaggcatg 


tgccactatg 


cccagctaat 


ttttgtaatt 


ttagtagaga 


62880 


tggggttttg 


ctatattggc 


caggctggtc 


tttcatacct 


gatgtcaagt 


tatccaccca 


62940 


tttcggcctc 


ccagagtgtt 


aggattacaa 


gtgtgagtca 


ctacatctgg 


ccaaattttg 


63000 


atatcaaggt 


gagagagatt 


taaaattaag 


ataaggtaca 


aaaattagcc 


tagtgtgggg 


63060 


gcgcacgctt 


gtagtcccaa 


ctactgggga 


ggctgtggca 


ggagaattgc 


ttgaaccagg 


63120 


aggcagaggt 


tgcagtgagc 


caagatggca 


ccactgcact 


ccatcctggg 


tgacagagcg 


63180 


agatgtcatc. 


tcaaaaacaa 


aacaggccgg 


gtacggtagt 


tcacgcctgt 


aatcccagca 


63240 


ctttgggagg 


ctgaggcggg 


cagatcacga 


ggtcaagaga 


ttgagaccat 


cctggctaat 


63300 


atggtgaaac 


cctgtcttta 


ctaaaattac 


aaaaattagc 


tgggcttggt 


ggtgtgtaac 


63360 


cccagctact 


cgggatgctg 


aggcaggaga 


atcgcttgaa 


cccgggaggc 


ggaggttgca 


63420 


gtgagccgag 


atcacgccac 


tgcactccag 


cctggcgaca 


gagcgagact 


ccgtctcaaa 


63480 


aaaataaata 


aaaatttaaa 


aagataaata 


cataaaaata 


aataaataat 


attaagagaa 


63540 


ggaaatcagg 


caggtagtgg 


cccctgacac 


aatgagtttt 


cccagaattg 


gattgcttgg 


63600 


aaatgccgct 


caaagagtgt 


ggtaaactcc 


atcgaaggct 


aaataccaac 


gtgacagtga 


63660 


taataaacaa 


gtactttatg 


ggaaagtttt 


ttttttaatt 


atttttaaaa 


agagagaaat 


63720 


tgtactggag 


aaaagaggaa 


ttcaggtaga 


ataattcttt 


tttttttttt 


ttttgagatg 


63780 


gaattttgct 


cctgttaccc 


aggctggagt 


gcagtggctt 


gatgttggct 


cactgcaacc 


63840 


tctgcctttg 


gggttcaagt 


gattctcatg 


cctcatcctc 


ctgagtagct 


gggtctatag 


63900 


gcatgcaccc 


caacacctgg 


ctaatttttg 


tgtttttagt 


agagataggg 


ttttaccatt 


63960 


ttggccaggc 


tggtctcaaa 


ctcctggctt 


cagatgatcc 


gcccaccttg 


gcttcccaaa 


64020 


gtgctgggat 


tacaggcaga 


ggccactgtg 


cctggcatga 


agaacaattt 


ttaaaagaat 


64080 


gacttttaag 


gatattaagt 


catcaaagta 


gatagagcca 


ttaaatgatg 


ggtagaaacc 


64140 


taatcttcca 


tcccatattt 


tatgacttat 


taaggaagat 


aggcaatctt 


gcagttgata 


64200 


ataaatattt 


ggctttccat 


acttagcacc 


gttttgaatt 


tttccagttc 


acagatggta 


64260 


tatttagtgc 


tgttccaata 


atattgcaaa 


aaataaatct 


gaagactcac 


ttctgggtca 


64320 


cagttatttc 


actattaaca 


ttaaaatctt 


acggacctac 


ctgcaacatg 


tagtggaaat 


64380 


aagttgtgtg 


gcacgttgtg 


gtgggtgcat 


tattaaataa 


atgtgcaaag 


gttttatggc 


64440 
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tgtctcaatt 
tgtaagtgag 
ttactcattc 
cccagtaaaa 
tttatgtcat 
cattttgctg 
ttgcttaaag 
gattcattga 
agtggtagac 
ccacctaatt 
taaaattgtg 
tttgctttac 
atttaggcat 
aaatgccagt 
aaccttagta 
gtctaaaggt 
actttttttc 
tttcacaata 
ataacacgcc 
cttgcataga 
tcttgagtat 
aaaactggtt 
ttacttaaaa 
tattgaaaaa 
aacattaaac 
aatagcagaa 
tatatcacag 
cttctgtcct 



tttccatagt 
taccaactgt 
taagtacata 
taattagata 
gtaccacctc 
gagttatata 
aggatttaca 
taaagaaact 
ctaggaaata 
cttacattta 
aaggtaatac 
ccaggcctat 
attttcatgg 
tgcaagttat 
aacttagcaa 
ttatatcctg 
cttgtagttc 
gactcacaaa 
tctattgtgt 
ccacaataga 
actggaattt 
ttaaggagat 
taagtgcttt 
cttctccatc 
ataatttttg 
aatagaacat 
tgatctgtct 
tatttcatag 



cttgagtcat 
actagaactt 
aaatcactgg 
attattaaaa 
aggtcacact 
agattagtaa 
taattgtaga 
agtcctaaga 
cctttctaag 
tttattaagg 
caaggaaata 
agagaacata 
tagtgacata 
ttcaaataaa 
actgagacct 
gtacaaactt 
agatttccaa 
atgaatcttg 
aaaaaaatca 
gacattgtta 
ttatttcatt 
aatactaaag 
aaatcttgct 
ttccttcaat 
aaagtagcca 
gccctctcct 
aaatattttt 
atctcttctg 



tcacatactg 
actgaatatt 
tctgatatgc 
tagaaagcat 
ttgggaaacg 
atgctataga 
cctgaaaagg 
gctaaaatga 
attagagctg 
gacatgcttt 
tttaggctat 
tctaaaataa 
atgtgcaatt 
gttgaataac 
aaggaagagc 
gtctatacat 
gaaagggaat 
ttaagcctat 
gctttattcc 
gattatatca 
gtagaaacac 
atactcactt 
cattcatgaa 
cccataggac 
atattaatct 
ctcgcccaaa 
ctcaaatatt 
tatcctgtgc 



cctttatggt 
tttcaagtct 
aagttaggtt 
ttgtcattat 
tagcttacgt 
ctaaatattg 
atcttagcag 
cttgctcaag 
cctggttagt 
ttacggtaga 
gcaagaaaag 
taagtaacat 
aagaagctat 
ctgacctcag 
aagatctgtg 
caccacacct 
attattgtcc 
gaatgggtaa 
taattccctg 
agaaggtgag 
agaatcataa 
gaggtagatt 
gccatagccc 
catgttttta 
ttgacaaaaa 
agtaagacta 
ttactaagaa 
agcttgtcta 



gtttatcact 
attttactta 
ttcgctaata 
gtgaactgga 
attgagctaa 
aaaaaaaagc 
tcttctgatc 
attttagtta 
gctatagctg 
aaaatatatc 
tcattcagct 
gtgaatttga 
aataatgtaa 
gaggggcaga 
gtagagccag 
gccaaatgga 
tagctcagcg 
gacttatccc 
tagtaggaag 
tggcaggcat 
ttagtcatct 
gttctcagaa 
tggaaggaga 
attgtagttt 
actcaaacta 
cattttaaag 
agcgtatatt 
tcatttcgag 



64500 
64560 
64620 
64680 
64740 
64800 
64860 
64920 
64980 
65040 
65100 
65160 
65220 
65280 
65340 
65400 
65460 
65520 
65580 
65640 
65700 
65760 
65820 
65880 
65940 
66000 
66060 
66120 
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cttttatatt 

agaatgtgat 

tttctttctt 

cttaggctta 

catgaatttc 

tcaaaaatgt 

taccttttcg 

cacacacctt 

tttagtgctt 

cagccttttt 

cgattgataa 

aacctgttaa 

accttcttct 

ctaatctgtc 

tctcctgtct 

ctacatcttt 

ttgaaaatat 

agttggaaac 

gtgctgcttt 

ctatttccca 

tatgcttgat 

taggtggggg 

caacctcagc 

cccgagtagc 

tagagacggg 

cctcccaaag 

ctagatattc 

tttttgtttc 

ctgtcaccca 



tcttattttg 

tgttctccag 

tctttttttt 

agtgatcctc 

caaatatgat 

tagaacttac 

aatctattaa 

ctactcctgt 

ttaaatatta 

ttcccctaaa 

ttcttctgtt 

ctcaggtctt 

tttcattcga 

ttcactacca 

tttaaattcc 

cagtaactga 

ctttattctg 

tattttccat 

gtattagaga 

aagctttttg 

gcatttactt 

caggggggaa 

tcactgcaac 

tgggattaca 

gtttcaggct 

tgctgggatt 

atccttcaat 

atctttctca 

ggctggagtg 



ttctcttgct 

ttttaaaaag 

tttttgattg 

ctgtctcagc 

gttatctttc 

ctctttaaat 

cttgcatatc 

cctcccactt 

ttcattacca 

taattgtgtt 

ccatttcagt 

tttttttttt 

gaccaaatgc 

tcataatttc 

atattttcca 

gaaaaggttg 

ctgtcttctc 

gagtattttg 

cagaagatca 

gatcttctct 

tttcttttct 

ggagtctcac 

ctccacctct 

ggcatctgcc 

ggtcttgaac 

agaggcatga 

tctgagaaag 

cttttttttt 

cagtggtgcg 



gcttaaatgt 

ctcttctcta 

agacagtctc 

ctcccaagta 

atataagctt 

aatcttataa 

atagtaaatg 

tttctcaccc 

agccaagtag 

attttttcat 

ctaatttcca 

ttttttccca 

tttccatgcc 

cttttccttt 

ctttcatgat 

tgtgagggaa 

attaataatt 

aaggcattat 

ggaatcagga 

ttattcttat 

tcttccctct 

tcttctcacc 

ggggttcaag 

accacgccca 

tcctgacctc 

gccactgcgc 

gttctataat 

tttttttttt 

atctcagctc 



gattctttga 

tctactaaaa 

gctgtgttct 

gctgagatta 

aaacataagc 

agccatttct 

aacccttagc 

aggctggagt 

cgtactattt 

ttgcttcatt 

catggccaaa 

gaataccttt 

tgtgtcaaag 

tttcctttct 

ttattttctc 

tattttgaga 

tgactggatg 

ctattgtctt 

tagcattgga 

cttctgatat 

tgtacatttc 

caggctagag 

cgattctcct 

gctaattttt 

aggtgatccg 

ccagcccctc 

tctttaattt 

ttttaagaca 

actgcgccct 



gatttataga 

tttccatgaa 

ccagcgtggc 

cagaaatttc 

ctttcttctt 

taatttttgt 

tccatcatac 

acagtgaata 

tttccttgta 

ttcatcttat 

ccaatctaaa 

tcctggaacc 

tggttgttct 

ttattatgta 

attttgatgg 

tgctgtagtt 

tcaagattta 

ctagcttcca 

cttcttattt 

tttataatga 

tttctttttt 

tgcagtggtg 

gcctcagcct 

gtatttttag 

ccctcctcag 

ttggaccttt 

aaaaaaagtt 

cagtcttgct 

tcgcctcctg 



66180 

66240 

66300 

66360 

66420 

66480 

66540 

66600 

66660 

66720 

66780 

66840 

66900 

66960 

67020 

67080 

67140 

67200 

67260 

67320 

67380 

67440 

67500 

67560 

67620 

67680 

67740 

67800 

67860 
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ggttcgagtg 
caccatgccc 
gctggtttca 
attacaggca 
atgtctctga 
attcctcctc 
tggttattct 
gtgctgtaag 
gcaagcctac 
ggggttattt 
tgactattgg 
gccatgtgaa 
tctgttgccc 
aggttcaagc 
ccacgcctgg 
tagtctcgaa 
tacaggtatg 
gagggatagg 
agattgcttg 
atttaaaaaa 
aaaatcccaa 
tagaatacaa 
tggaatttga 
cactttggga 
caatatggtg 
tgcctgtaat 
agaggttgca 
tccgtctcaa 



attctcctgc 
ggctaattgt 
aactcctgac 
tgagccaccg 
gaggttttgt 
ttttcttttt 
ggttgtccac 
taagtaaatt 
tgtttatttg 
ggcttctcta 
tgttccagaa 
attactaggt 
aggctggagt 
aattctcatg 
ctaatttttg 
ctcctgacct 
agccaccaca 
catggtggct 
agtccaggag 
aaaaaataga 
agaaccaaca 
agttaatata 
aattaaaaca 
ggccgaggtg 
aaaccccatc 
ctcagctact 
gtgagccaag 
aaaaaaaaac 



ctcagcctcc 
tgtattttta 
ctcaggtgat 
tgcccagccc 
caagttttct 
gccttaagct 
ttatattacg 
gttgatagtg 
ggtttctcca 
gaaaagaatt 
aatgggtggg 
tttgttttgt 
gccttagcat 
cctcagcctc 
tatttttagt 
caggtgatcc 
cctgggctga 
cccacctata 
ttcgacacca 
tgaatatctt 
aaaagagctc 
caaagtcaca 
caatactgct 
ggcagatcac 
tctactaaaa 
caggaggctg 
atcgcgccat 
acaatacctt 



caagcagctg 
gtagagacgg 
ccacctgctt 
tattctgatt 
tttgtttact 
ttttcatatt 
tggaacaata 
agagcccatt 
aatggctata 
ttctagtctc 
aggagacgac 
tttctgtttt 
gatcttggct 
ccaagtagct 
agagacgggg 
acctgccttg 
aattactgtt 
atcccagcac 
gcctgggcaa 
tgtttgcaga 
ctagaactaa 
ttgctttttt 
gggtgcagtg 
ctgatgtcag 
atacaaaaat 
aggcaggaga 
tgcactccag 
tcatattaac 



ggattacagg 
agtttcacca 
tggcctccca 
tcatagatgc 
ttattatctg 
ggaagcttct 
aaaaagttga 
aatcacagga 
gctgtcagtc 
ttgtttggag 
tttgttttct 
gttttgagac 
cactgcaacc 
gggattatag 
ttttgccatg 
gcctcccaaa 
tttataggtc 
tttcagaggc 
tgtagtgaaa 
taacatgatt 
taagtgatta 
atctaccagc 
gctcacacct 
gggttcaaga 
tagctgggca 
atcgcttgaa 
cctgggggac 
actaataaaa 



tgcccaccac 
tgttggccag 
gagtgctggg 
agtgtctttt 
tttcttcaag 
ctcaaatgtc 
ttaggactct 
tgatcaagca 
ttttttttct 
aatacaagct 
gttaggttga 
agagtctcac 
tctgcctccc 
gcatgcacca 
ttgaccagtc 
gtgctgggat 
aaaaacagtt 
caaggcagga 
ccccatctat 
gcatatgtag 
tgacaaggtg 
agtgaacaac 
gtaatcccag 
ccagcctgac 
tggtggcggg 
cccaggaggc 
agagcaagac 
tgaaatatgt 



67920 
67980 
68040 
68100 
68160 
68220 
68280 
68340 
68400 
68460 
68520 
68580 
68640 
68700 
68760 
68820 
68880 
68940 
69000 
69060 
69120 
69180 
69240 
69300 
69360 
69420 
69480 
69540 
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gtagttctaa 
agaaatcaaa 
tattattgag 
accccagcca 
cctagaacag 
ttcaagactt 
caaataaatc 
ctgatctttg 
tggtatagga 
ctttcacaaa 
ggataacata 
cttaggtaca 
taaattataa 
gagtgggaga 
gaacttttaa 
ctgaacagac 
caacatcatg 
tattagaatg 
agcagaaact 
agttggacag 
atatggtcca 
ataaaaaaaa 
agcagtcaag 
ggagtattat 
ccataaatgc 
tgactatatg 
ccaagggaga 
tgttctttat 
aatataaaac 



caaagtttgt 
gatctaagta 
atgtcagttc 
gttattttgt 
ccaacacagt 
actgtaaagc 
agtagaattg 
gcaaagggac 
caactggaca 
aataaatgga 
ggagaaaatc 
ataccaaaac 
acttctgctc 
ctttacaaaa 
aactcaacaa 
acctcaccaa 
tcgttaggga 
gctgaaatct 
ttaattcatt 
ttttttacag 
gcagtcttac 
ctgcacatga 
gcatccataa 
tcagcaataa 
atattgctaa 
atgttttgga 
cagggagaga 
gataatccaa 
ataagagtga 



tgtagaagat 
aactgagaga 
ttcccaagtt 
ggatactggc 
attgaagaag 
tacattaatc 
gacagagagc 
aaagacaatt 
tccacatgca 
tcatagacct 
taggataaca 
atgatccttg 
tgtgtaagat 
ttcattatct 
taagaaaata 
ggaagataga 
aattgcgcat 
aaaacactga 
gctgatggaa 
aactaaagac 
tccttagtat 
atatttctag 
gtaggggaat 
aaagaaatga 
atgaaagaag 
aaaggcaaaa 
tgaataggtg 
tggtggatac 
accctaatgt 



ctatatgaga 
tattccatgt 
catatatcga 
aaactaaagt 
aaaaaagtca 
aagacagcat 
ctagaaatca 
cagtggagaa 
aaaaagttaa 
aaatgtaaca 
tgagaaaaat 
aaaaaaaaaa 
gctgttagga 
gatgaaggac 
tacaacccag 
tagatgacaa 
taaaacaaca 
caacaccaaa 
atctaaaatg 
agtttgacag 
ttacccaaat 
cagcattatt 
ggataaacag 
tctatcaagc 
ccagtctgaa 
ctatggaaac 
gagcacagtg 
atgtcattat 
aaaatatgga 



agaattatag 
aaatggacag 
ttcagtgcag 
ttatatgaaa 
gaggactgaa 
gtcattggca 
acccacacag 
aagatagcct 
tctagacaca 
tgcgaactga 
ttttggtttg 
tcagtatgtt 
gaatgaaaag 
cagtatccaa 
ttaataaatg 
caagcatatg 
acaagatacc 
ttctggcagg 
gtagaaccat 
tttcttacaa 
aagtttaaaa 
catagttgcc 
actttggtat 
cacaaaaata 
gaggctacac 
agtaaataga 
gatttttaag 
acctttgtca 
cttcagttaa 



cactcatgaa 
ggagactaaa 
tcccagtcaa 
aggcaaaaga 
actacccaat 
aaagaataga 
ataaagtcaa 
tttcaacaaa 
gacctgacaa 
aacttctaga 
gcagtgactt 
gaactttgtt 
acatgcagca 
aatatacaaa 
ggcaaaatat 
aatatatgct 
ctgccatccc 
gatgtggagc 
tttggaaggt 
aactcttacc 
tgtacatcca 
aaaacttgga 
atcatgtaat 
tatggaggaa 
tataggattc 
tcagtggttg 
gcagtgaaac 
aaacccacag 
taataatata 



69600 
69660 
69720 
69780 
69840 
69900 
69960 
70020 
70080 
70140 
70200 
70260 
70320 
70380 
70440 
70500 
70560 
70620 
70680 
70740 
70800 
70860 
70920 
70980 
71040 
71100 
71160 
71220 
71280 
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tgaatatttt 
gggaaattgg 
gtaaacctaa 
attttttgag 
gctcactgaa 
gctgagatta 
ggtttctcca 
cggcctccca 
ttaagttgga 
tattaaagtg 
tagtgtccct 
tgctttggtg 
tttcaatcag 
gcccgggttg 
acgccatttt 
ccggctaatt 
ctcgatctcc 
cgtgacgact 
tactgctttc 
acgtcagcgt 
tctgctaagt 
tctccttttc 
gtgggatttc 
tttcattttt 
ttttgagaca 
actgcaccct 
ggagctcagg 
cagagttccc 



ttcattagtt 
aaaggaatga 
aatcactaaa 
atggagtttc 
acctccgcct 
caggcatgtg 
tgttggtcag 
aagtgctgtg 
tattagccat 
taggtgagat 
gaatctggag 
gtgattggta 
ttcctgtgtt 
gagtgcagtg 
cctgcctcag 
tttttgtatt 
tgacctcgtg 
gtaagccacc 
tgtggtgcct 
gcttcttgca 
gatttaccac 
tatttttgtt 
taagtagaga 
accaccgagg 
gggtctcact 
tggcctccca 
tgatccagca 
ctatgttgcc 



ctaacaagtg 
gaggttatat 
aaaaaagttt 
actcttgttg 
ccagggttca 
ccaccacgcc 
gctggtcttg 
attataggcg 
ttcatatgat 
gttttcagtc 
actctgactt 
cctgcttgac 
tgtttgtttg 
gcgcaatctt 
cctcctgagt 
tttagtagag 
atccgcccgc 
gtgcccggcc 
gataacatcc 
tttaggtatc 
actttatcag 
tggaagattc 
aacatgccta 
agttggtttt 
ttgtcagcca 
agctcaagtg 
cacccagcta 
caagctggtc 



tactacacta 
gggaactctg 
atttttattt 
cccaggctgg 
agcgattctc 
cagctaattt 
aactcctgat 
tgagcagctg 
tcaacttaaa 
tggagctcta 
atttctttag 
tgcctttggt 
tttgtttttg 
ggctcaatac 
agctgggacg 
acggggtttc 
cttggcctcc 
agttcctctg 
aattcctgaa 
ccttcacaag 
ttctccattt 
atatattttt 
tgttcaatat 
atttcttttt 
ggctggagta 
atcctcccac 
attttttttt 
ttgaactctt 



atacaagata 
tactttctgc 
ttattttttt 
aatacaatgg 
atgcctcaac 
tgcagtttta 
ctcaggggat 
cgcccagcag 
agtacataca 
cccttgattc 
agaacgaaac 
ggggagttcc 
agacggagtt 
acgctccgcc 
acaggcaccc 
atcgtgttag 
caaagtgctg 
tttttgactg 
ccttcctggg 
taggcattta 
tgtggaattc 
tattcattta 
gtcttgttta 
tttttttttt 
cagtggcaca 
ctcagcctcc 
tttttaattt 
gggctcatgt 



ttcagagtag 
tcaattttct 
ttaattttta 
cacgatctcg 
ctcccgagta 
gtagagacag 
ctgcccgcct 
gttttttttt 
ccttcactgt 
ctgccatgcc 
tcctgccttc 
tcataccaac 
ttgctctgtc 
tcccgggttc 
gccaccacgc 
ccaggatggt 
ggattacagg 
cctgccttac 
atttttgttc 
ggttttaagc 
attgtgttaa 
gtattttttg 
agcagtctgc 
tttttttttt 
atcaaagctc 
tgaatagctg 
tttgtagaag 
aatcctcctg 



71340 
71400 
71460 
71520 
71580 
71640 
71700 
71760 
71820 
71880 
71940 
72000 
72060 
72120 
72180 
72240 
72300 
72360 
72420 
72480 
72540 
72600 
72660 
72720 
72780 
72840 
72900 
72960 



47 



tctttgcccc 
tattacatta 
gatgctactg 
tcatagttgt 
agagggctgt 
ttcaaatttg 
tgtccttgag 
tttgtttgtt 
tcaagcgatt 
acctgtctag 
gtcgaactcc 
ggcatgagcc 
gaagtagatg 
gacagtatga 
ctatattgag 
aagtaatcag 
atcaggctct 
gaaagagtaa 
agaagagagg 
ctcaggagga 
aggaacacat 
tttttattaa 
tgtgatctcg 
ttcctgagta 
attagagatg 
tgcgcctgcc 
tcatagctta 
tttaaatgta 
cagtagctgc 



ctaaagtctg 
atttatgatt 
aaaaaaggat 
aaactaaagt 
tactctttgg 
agttttctgt 
tctattattt 
ttgttttgtt 
ttcatgcctc 
gttttttatt 
tgacctcaag 
actgcacctg 
gaaaaccatg 
ttgtcattat 
tatctttcta 
tatgttatag 
ggatacctct 
cttttcctta 
agaataacac 
gagtagaaag 
tgattgccat 
tttttttttt 
gctcactacc 
gctgggatta 
ggatttcacc 
tcggcctccg 
catttttaga 
atatattgaa 
agaatcgaat 



ggattacagg 
atgtgtttcc 
gctttttaga 
atatattttt 
tccacctggg 
tgagatattt 
acgacttgct 
ttgagtgatc 
agcctcccaa 
tttggcagag 
cagtcctcct 
gctgatacac 
tacgttatct 
tttttgtaaa 
acccctgatt 
cttttttttt 
ttcctctgca 
gatgtttgtc 
tgtctctctt 
aagcacagct 
gtattgggga 
tagacagtct 
atctcttcct 
caggcatgcg 
atgttggcca 
aaagtgctgg 
gaatcttttc 
ctaatttaat 
gcaaccttct 



tgtgagccac 
tttaaagcta 
tggcaaagag 
tagttgttca 
aatgggaaga 
gggataatat 
ttttgctatt 
tgggctcacc 
gtagctggga 
acaggatttc 
acctgggcct 
ttttaagttt 
tcagtagtgt 
ttaaattttt 
tttgcttcta 
tttttaagta 
tagtcctcct 
tttctcaaag 
tttttttaaa 
cttcctataa 
ttgtattata 
tactctatca 
tctgggttca 
ccaccacgcc 
ggctggtctc 
gattacaggc 
tagtacttaa 
atttgctctt 
ttaatataag 



cacacccagc 
tgggcagctc 
tacttaaaat 
cagggcttag 
caatgctggt 
gaaaaaaaga 
gtacactttt 
gcaacctccg 
ttacaggcac 
gccatgttgg 
cctaaagtga 
ttcagctact 
gtttttggtt 
acctggaaga 
ctatcataat 
ttcttttgcc 
ggatggaaga 
cagttatctt 
tctctctcta 
cctgtcctta 
ccttacattt 
cccaagctgg 
agcaattctg 
tgactaattt 
aaactcttgg 
atgagccacc 
atcggtaaat 
gtgattttta 
tgctgcaagt 



tgtttttaat 
tgtttgggaa 
gtctctagaa 
agctcctgcc 
aagggttctc 
aactttatct 
gttttttttg 
cctcctgggt 
acgccaccac 
ccaggctggt 
tgggattaca 
tttcaatgta 
ggttaaattt 
gcttacctta 
aactttattt 
agaagttttt 
aacaaagagg 
tgtatatcta 
ctcattctct 
ttactgagaa 
ttatttttat 
agtgcagtga 
ccacgtcagc 
ttgtattttt 
cctcaagtga 
acacctggcc 
atggttatct 
aaggctaaag 
ttaacttcaa 



73020 
73080 
73140 
73200 
73260 
73320 
73380 
73440 
73500 
73560 
73620 
73680 
73740 
73800 
73860 
73920 
73980 
74040 
74100 
74160 
74220 
74280 
74340 
74400 
74460 
74520 
74580 
74640 
74700 



48 



aatacgtgag 
gtgttaactg 
tctattattt 
tgtcaaatac 
taaacatcag 
atgttggcag 
ctcctggctt 
ctcctatctc 
tttcccctgt 
attctatgtc 
tggagaacac 
agtggatttt 
tatttctggc 
tttttttttt 
ctgggcttaa 
caccacacct 
tttttccatt 
ttggggtagg 
ctagtgtgta 
cactgctttg 
ataccttagt 
atgcctataa 
tggagaccag 
caggtgtggt 
cttgagcctg 
agtgacagtg 
cctgtaatcc 
ataccaccct 



tgctctgttt 
taaatggtaa 
acatatgatg 
tgtattagtt 
caatttattg 
ggttgcttcc 
ctggtggctt 
tgtctttatc 
ggattaggac 
caaataatgt 
agttcaaccc 
atctcaagga 
cttctgattc 
tttataatag 
gcaatcctcc 
ggccagtaga 
caccctctgc 
gcaagcataa 
aatataactg 
tctttctccc 
ctctgcttgc 
tcctagcatt 
cctaggcaac 
ggcatgcacc 
agaggttgag 
tgagaccctg 
cagtactttg 
gggcaacata 



ccaatattgt 
tatttcatga 
aatatctatc 
tactggggcc 
tctcacagtt 
tcctgagggc 
gctggcaatc 
tttatgtggc 
ccaccctaat 
cacattcata 
attaacaaat 
gccaccagat 
tgatctaaaa 
agacacagtc 
tgcctcggac 
ttttccctgc 
tgaccctata 
ctacagtgct 
gtgcattgca 
tctctccctt 
ctttatcaaa 
ttgggaggct 
atggtgaaac 
tgtagtccca 
gctgcagtga 
tctcaaagaa 
ggagactgag 
gtgagacccc 



cgtattttaa 
aaatattttt 
ttcagagtag 
atgtaataaa 
gtggaaggta 
tgtgagggaa 
ttttgtattc 
attctccctg 
gatctcaatt 
ggtactaggg 
actatcactt 
aggaacacag 
tatgacagaa 
tcgctatgtt 
tctcaaagtg 
tttcttttga 
gtattattca 
taaagagagt 
aaactgtgaa 
tctcttggtt 
acctttatga 
gaggcaggaa 
cccatctctt 
gctactcagg 
actgtgatca 
acaaaacaag 
gcaggaggat 
catctctaca 



gttactgtct 
ctaggagctt 
aaagttatgt 
ataccgtaaa 
gaagtctaag 
aatgtatgtt 
cttggcttgt 
tgtctgtcac 
taagtttgtc 
ataggacttc 
tccacttaag 
atctgatggc 
gattttccct 
gccaggctac 
ctggaattac 
ttgtttataa 
aagaagtgtt 
aatttgtctg 
gtagtttctg 
gccccctccc 
ttggccgggc 
gatcacctga 
ccaaaaataa 
aagctgaggc 
tgccactgca 
ggggggcatg 
tgcttgaggc 
aaaaataaaa 



aaatgttact 
atctattgta 
acatttgtgt 
ctggctggct 
atcagtcaaa 
gtgtgcctct 
agatgcatcc 
catgtccaaa 
atcagcaaca 
aacacgtttt 
cttcaagtaa 
ataaactgag 
gttttaattt 
tttcaaattc 
aagtgtgagc 
ttttgttttc 
cagtctagtt 
gtgtgcagaa 
tcaaacctta 
cctcccaatg 
acagtggccc 
gcccagaagt 
aaaatagagc 
gagaggatct 
ctccagcctg 
gtggctaact 
caggagttca 
aatttagctg 



74760 
74820 
74880 
74940 
75000 
75060 
75120 
75180 
75240 
75300 
75360 
75420 
75480 
75540 
75600 
75660 
75720 
75780 
75840 
75900 
75960 
76020 
76080 
76140 
76200 
76260 
76320 
76380 



49 



gacatgccag 
agtcaaggag 
tcagagcaag 
tggcaaacat 
gtttgaagtg 
acgcctgtaa 
tcaaaaccag 
cagacttggt 
ctcgaaccca 
ggcaactgag 
agaggtgatc 
tttactgtct 
tctatgtctt 
tctccctgta 
gagttaaagc 
ccatcactat 
aagactccct 
ttcctcttca 
attcttcaga 
tattattctc 
ttcattgata 
tctccatgcg 
gtgagtaggt 
catagctcta 
aaccaggaag 
ttacagcaac 
tgcagtggcg 
gccttagcct 
gtattttagt 



cgaatacgtg 
gttgaggttg 
accacgtctc 
acttaaactg 
tcagcatttg 
tcccagcact 
cctggccaac 
ggtgtgcgcc 
ggaggcagag 
tgagactctg 
accacatgat 
acagttttgc 
cggaacagac 
cctatagaca 
aaacttgaca 
gatagagata 
cttccttagg 
tggagtttcc 
taccagttga 
ctagcaccac 
ggattatttt 
ggcagagact 
attgaataaa 
ttttattaat 
tgggaaacta 
cattctcttt 
tgatctcggc 
gctgagtagc 
agagacgggg 



gtcccagcaa 
cagtgagcca 
aaaaacaaaa 
aaatgtgaat 
ttgcaccgaa 
ttgggaagcc 
attgcaaaac 
tgtaatccca 
gttgcagtga 
tctcaaaaaa 
atccagatag 
agcgagagaa 
atgagatcag 
tgtatgggaa 
tgttgaccat 
gccttggtgt 
cagttcttca 
tttctcagga 
aatgttacaa 
ttgcacctca 
aattgtatct 
atcaccatgt 
tatttgtgga 
tagggaatta 
ggatttgagc 
tttttttttt 
tcactgcagc 
tgggactaca 
tttcaccatg 



atcaggaggc 
cgatcatgcc 
caaaacaaac 
ctctgatgaa 
atccagaggt 
aaggctggca 
cccgtctcga 
gctattcggg 
gccgagatgg 
aaaaaaaaaa 
cctctttcca 
acttgatttt 
aattgtctag 
acttatttgt 
agttgttatg 
tggacccata 
aatattatta 
acactatctt 
ccttccctga 
ttttcatact 
gatatcactg 
gttcttcact 
gtaatcataa 
cagaattcag 
tccagtgagt 
ttttttttga 
ctctgcctcc 
ggcacccgcc 
ttggccagga 



tgaggtggga 
aatgcattcc 
ttttatggtt 
agaacatgtt 
gaggccaggt 
gatcacctga 
ctaaaaatac 
aggctgagac 
caccactgca 
aaaaatccag 
tgagaggctc 
atcagtacac 
ctgctatgaa 
aaggttgtat 
gcattggact 
gtttttgaag 
tgctttttcc 
ctcctggtta 
ccctccaaac 
cactggagtt 
tcacctccac 
tcaattttca 
tgaggtatag 
tgatctgctc 
gtggcctttc 
gatggagtct 
cgggttcaag 
acgacacttg 
tggtctcgat 



ggatcacttg 
aacctgggcg 
gaaagtgttt 
acctgtaaaa 
gtggtggctc 
ggtcaggagt 
aaaaattagc 
acgagaattg 
ctccagcctg 
aggtgaatcc 
aaaggataat 
caagagcaga 
cagcatgttc 
aatgagcagt 
aaagtagcct 
tctgtttgct 
tgtcatggga 
atttgtactt 
tattccccgt 
gcaattcata 
tagaagatgg 
gtagttggct 
atattattct 
aggatctcat 
attaaaaata 
tgctctggag 
tgatccccct 
gctaattttt 
ctcctgacct 



76440 
76500 
76560 
76620 
76680 
76740 
76800 
76860 
76920 
76980 
77040 
77100 
77160 
77220 
77280 
77340 
77400 
77460 
77520 
77580 
77640 
77700 
77760 
77820 
77880 
77940 
78000 
78060 
78120 



50 



tgtgatccgc 
ggcctacagc 
tgttttcaga 
atcttgaagt 
atcccagcac 
ggctgggcaa 
gtggcgtgtg 
tgggaggtgg 
agagtgagat 
attttgtcag 
taatttttac 
ttcaaatata 
attttagttg 
atattttacc 
gcagtggctc 
ttgagaccag 
actaaatact 
acatgcctgt 
ggcggaggtt 
aaactccatc 
tcatgatcta 
gtgttattaa 
tgttgtatta 
cataagtata 
tcaccaccca 
caccatcttt 
aaagctagca 
gtatttggct 



ccgcatctac 
aaccattctc 
aacaacatag 
aagtttactt 
tttgggaggc 
catggcgaaa 
cctgtggtcc 
agtttgctgt 
cccatctcaa 
ttctatctaa 
atacttgatt 
taggaactta 
tctttctatg 
tcagatgtga 
acacctgtga 
cctgattaac 
aaatactaaa 
aatcccagct 
gcggtgagcc 
tcaaaaaaaa 
tcctgagaaa 
aatatctaga 
ttgtttaacc 
cagctgtata 
ggtcaagaaa 
atccactcac 
tgcctttgaa 
tctttcatca 



ctcccaaagt 
ttttatccat 
cattcatgat 
ttaagaaagt 
caaggcaggt 
ccctgtctct 
cagccacttg 
gagccgagag 
aaaaagaaaa 
tacaattttt 
taacaaaact 
atgttatatc 
gtctcatgcc 
cctgactttg 
tcccagcact 
atggagaaac 
tctctactaa 
actcgggagg 
gagattgcgc 
aaagcagcag 
gcttttggga 
ttattttcca 
ttgtttattg 
cagctcagtg 
taaattgtta 
tcctctccct 
ctttatataa 
gcattgtatt 



gctggattat 
acttttttca 
cttaaccccc 
tgaggctagg 
ggatcacttg 
accagaaata 
ggagactgaa 
atcatgccac 
gaaaaagaaa 
tccttatgtc 
caatcttttt 
tgcttccctc 
agtttgtcag 
aagacttaaa 
ttgggaggcc 
cccatgtttc 
aaatacaaaa 
ctgaggcagg 
cattccactc 
cattgtgtaa 
ggaactgcat 
caaaaaatca 
aataactaac 
gattaccaca 
cctgtggccc 
caaaaccact 
atctaatgat 
tgtgagattt 



aggcgtaagc 
agagtactgt 
aattctgata 
tgtggtggct 
agctcaggag 
caaaaaatta 
gtgggaggat 
tgtactccag 
attgaaatgt 
taactgaaat 
ttttttttat 
cagtccccag 
tatgcctaga 
aaggaagcat 
gaggtgggca 
tctactaaat 
gatgagctgg 
agaatcactt 
cagcctgggc 
tattatgtag 
catagtcatg 
gttacatatg 
atgtagaaaa 
aagcgaatat 
taaaatccct 
agactactaa 
gtaggatttt 
atccagattg 



caccacaccc 
ttcatcttca 
ctgcctgaat 
catgcctgta 
ttcaagacca 
gtcgggcgtg 
ttcttgagct 
cctgggtggc 
ctagtctatc 
ctgctttttc 
gagacagcct 
aatagttact 
taagaactga 
tgtgccaggc 
ggtcaggagt 
acaaatctct 
gcattgtggc 
gagcccagga 
aacaaaagtg 
atgttgtgtc 
gacaacattt 
tatcttaaca 
gtatttatag 
actttcataa 
ccaggcactc 
catcatagac 
gtgtgtatgt 
ttgcaagtag 



78180 
78240 
78300 
78360 
78420 
78480 
78540 
78600 
78660 
78720 
78780 
78840 
78900 
78960 
79020 
79080 
79140 
79200 
79260 
79320 
79380 
79440 
79500 
79560 
79620 
79680 
79740 
79800 



51 



ttgtagttgt 
ttaacctgct 
ttacagtgtt 
tgctgctaga 
acctccacta 
aagtaaaatc 
ctcaatattc 
tttacttcct 
aagacagggt 
agcctcagcc 
ataggcacat 
gctcacacct 
gttcgagacc 
agcggggtgt 
ttgcttgaac 
attttttttt 
ctcaagtgat 
tgcctggtct 
gaaagtattg 
gacaaggtct 
aacctccaac 
acaggcatgc 
catgttggcc 
caaagtgctg 
cctgaactga 
cataggagat 
tgtgtgtgtg 
tctcggctca 
gagtagctgg 



gcttttttac 
tatctattat 
ttttccacac 
tttaaatact 
gaagatggtc 
ttctgtaagt 
taaagtagtt 
atgtgaaatt 
ctgttctgtc 
tcctgggctc 
gccaccatgc 
gtaatcccag 
agcctggcta 
ggtggcaggc 
ctgggaggca 
gtagtgacaa 
cctcccattt 
tcaagttgtt 
ttgtggaagt 
cactttttcg 
tctcaggttc 
gccaccacgc 
aggctggtct 
ggattacagg 
ccttaagaaa 
gatcttttga 
tgtgagatgg 
ctgcaagctc 
gactacaggc 



acagatttaa 
taaaaaaaaa 
cgtcttcaaa 
agggaaaaaa 
tccatgtgga 
ttctttaaat 
aaaagtaact 
ttacaagtcg 
gcccaggctg 
aagtgatcct 
ccagctaatt 
cactttggga 
acatggtgaa 
gcctgtaatc 
gagattgcag 
ggtgtcactg 
cggcctccca 
attaaagcat 
taggagatag 
ccccaggccg 
aagcaattct 
ccggctaatt 
tgaactcccg 
catgagccac 
gtataacttt 
gatttctttc 
agtcttgctc 
cacctcctgg 
gcctgccacc 



tttttatatt 
aaaacgaaca 
atgtaaagtt 
aaatcagaga 
cagtaatatt 
attttaataa 
ataaaatagt 
ttactctatt 
gagtacagtg 
cccaccttag 
ttaaaaaatt 
ggccgaggca 
accctgtctc 
ccacttactc 
tgagccgaga 
tgttgccagg 
aagtgctagg 
gtttacccac 
ggattctagc 
aagtgcagtt 
cccacgtcag 
tttgtagttt 
acctcaggtg 
cgcacccggc 
aggcctgttt 
agctctgata 
tgtcgcccgg 
gttcacgcca 
acgcctggct 



tttcttatat 
ttcacatagt 
tggtcttcaa 
agttaataat 
tctcttgtat 
atcatagtac 
acctgttttt 
tatttattga 
gcgtgatcat 
catcccaagt 
ctggggggcc 
ggcgaatcac 
tactaaaaat 
aggaggctga 
ctccatctca 
gctggtctca 
atcacaggca 
attatgcaca 
ctagcttttt 
gtgcgatctc 
cctcccgagt 
tagtagacac 
atccacccac 
ctctagcgta 
catctgtaaa 
attttgtgtg 
gctggagtgc 
ttctcctgcc 
aattttttgt 



gttcgaacag 
tcttaccagt 
tacatcagta 
attaactgtc 
tatctgtgct 
ttaaaatgtt 
ctgatcacat 
tttatttttt 
ggctcactgc 
agctgggact 
gaatgcggtg 
aaggtcagga 
acaaaaaatt 
ggcaggagag 
aaaaaaaaaa 
aacttctggg 
tgagtcactg 
tggtataatg 
atttttttgg 
ggctcactgc 
agctgggatt 
agggtttcac 
cttggcctcc 
acttttacat 
atgttaatgt 
tgtgtgtgtg 
agtggtacca 
tcagcctccc 
atttttagta 



79860 
79920 
79980 
80040 
80100 
80160 
80220 
80280 
80340 
80400 
80460 
80520 
80580 
80640 
80700 
80760 
80820 
80880 
80940 
81000 
81060 
81120 
81180 
81240 
81300 
81360 
81420 
81480 
81540 



52 



gagacgaggt 
cgcctcagcc 
atttgtttcg 
taaaaagaat 
tcatgactca 
tctcaagggg 
gggcgcggtg 
ttgaggtcag 
tccctctcca 
gaagggcagg 
taaaaataca 
tgttgaggga 
accactgcac 
aaaaaaaaga 
aagttttgtt 
tagggcttaa 
tagttatatc 
caaaacagat 
aaaattgtaa 
ttaaggaagg 
gaataatgtt 
gagaaattgg 
ataggtaaga 
ataggaaaga 
ttaaagacta 
aatctcagat 
agatgaagtt 
cctaaaaact 



ttcaccgtgt 
tcccaaagtg 
attattctaa 
aaatgccacc 
ttactttggt 
tgttcattga 
gctcatgcct 
gagtttgaga 
aataggaaag 
cttaaagact 
aaaattagcc 
tgagaatcac 
tccagactgg 
aattttgata 
taggagagca 
agaatatgta 
cttgactgaa 
gaatttttaa 
gggacggtta 
gaaattaaat 
gcattttatg 
tgagggctct 
acatattttc 
tacgattgga 
tctaatgaat 
gactcacata 
gatagccttt 
gaatttctaa 



tagccaggat 
ctgggattac 
atctggtgac 
tagaggacag 
gtataaaatg 
ccttctggac 
gtaattccag 
tcagcctggc 
ataggattgg 
atctaatgaa 
aggcgtggtg 
ttgaacccag 
gctatagagc 
tttatgtgag 
cattccaact 
caatgtcttt 
gagctatttc 
agcacttaac 
gtagtactct 
tcctgtgtgc 
tgtataacag 
ttttgctgtg 
caactaagtt 
aacattattc 
ttagtaggac 
gcttggtctt 
tgtgtgaaag 
tagaatttga 



ggtctcgatc 
aggtctgagc 
atttcttttg 
aaaaattttt 
gcctttgtat 
tatctggaaa 
cacctcggga 
caactaagtt 
aaacattatt 
tttaccaata 
gtgggcgcct 
gaggcaaaag 
aaaactctgt 
aatgactttt 
tacttgcttc 
ttctctcccc 
attctcaagt 
caggctgtat 
cccctttctc 
tagattttca 
tataatgctt 
gctcgagaac 
attgactatt 
agaaggaaga 
ccactatatt 
taattaaagt 
aagagaaggg 
tggtgtaagt 



tcctgacctc 
cgccgcgccc 
tttttaagtt 
acagtagatt 
ggtgtcagca 
tattttgata 
ggccaagtta 
agttgactat 
cagaaggaag 
tggtaaaacc 
gtaatcccag 
tcacagtgag 
ctcaaaaaaa 
cacggtgttc 
tataaatata 
tagtcttccc 
cttaggaatg 
gaaatcacag 
aaaccaaatc 
acataaaatt 
tgttttaggt 
ttcaaccttc 
tgtgaaattt 
agttttaaag 
aataagtagt 
cttatacttg 
gagcacgatg 
gttgattatg 



gtgatccgcc 
ggcctagaat 
aaatcttcag 
atcacagacc 
cctgggaatg 
tttattggct 
ggtggatcac 
ttgtgaaatt 
aagttttaaa 
ccatctttac 
ctacctggga 
ccgagattgc 
aaaaggaaaa 
ttaatagcgc 
ccgtgtaatc 
cttttctcac 
cagggtgaag 
tctgttgtct 
tttggttgtt 
taaaaaactg 
gggagaagga 
tataattttt 
ccctctccaa 
aagggcaagc 
aaactagatt 
tatttcctct 
ctagtagacg 
atatttttaa 



81600 
81660 
81720 
81780 
81840 
81900 
81960 
82020 
82080 
82140 
82200 
82260 
82320 
82380 
82440 
82500 
82560 
82620 
82680 
82740 
82800 
82860 
82920 
82980 
83040 
83100 
83160 
83220 



53 



tgtggcagca 
ttatatttgt 
tctgctggag 
gaggctgttc 
ttactcatgt 
tattttcttc 
aacagcttga 
ggtcagttct 
ttttattggg 
tcttgtgtgt 
aattcttggg 
tgtagtagtg 
ttttttctct 
ggtgcagtag 
gcagtgtgta 
tgagaacatg 
cacctccatc 
catagt attc 
acttagattg 
tctttttggt 
tcaaaggtag 
gaactttaca 
ttttactttt 
gccattctga 
agtgatgatg 
attcatttcc 
ttgaattcct 
tgtttactct 
acttgtcaat 



ttttagtata 
tattactttt 
atgacagagt 
tcaggtaggg 
gtccatctta 
cagtactatc 
tatcaggaga 
tctgggttgc 
cttcatgaga 
ttgagcagtt 
catcctttcc 
taaaagtctg 
ctttactgga 
gtagtttttc 
ttgttcccat 
agctatttgg 
catgttgctg 
catggtgtgt 
attctatgtc 
ataatgatct 
atttgtttta 
ttcctaccag 
tcccgccaat 
ctggtgtgag 
agcatttttt 
tttttaccca 
tatagattct 
gttgatggtt 
ttttattttt 



ttttcctatt 
ctaaatgaat 
acttgtaatg 
agatttatat 
catattattt 
tctagcctct 
attagtctag 
attaaattat 
aaataatcaa 
tattcccttc 
caaggaattt 
aatccaaatt 
tgggttacgt 
aacccacacc 
gtttatgtcc 
tttactgttc 
caaaggacat 
acgtaccaca 
tgctgtcatg 
attttccttt 
tgttttttga 
cagtacatac 
ctgtagacaa 
atatctcatt 
catgtttgtt 
ttttttaata 
ggatatcaga 
ctttttgctg 
gttgcaattg 



aaatggccaa 
tgaaaaaaga 
ggtgcaacta 
ggaaatacat 
ccttactctc 
tgttaccaac 
taaaggttta 
ctttaaactt 
atcagggttc 
atactggaag 
acaatattgc 
gattttttca 
gggtatattg 
tcactgcttt 
atgtgtactc 
ccacattaat 
tatttcattc 
ttttctttat 
aatagtacag 
gggtatatac 
gaaatcttca 
gctccacaac 
tatgggattt 
gtgattttga 
gtcccctcgt 
gggttgtgtg 
ccattgttag 
tgcagaagct 
tgtttgggga 



ggttaaaaat 
ttttttgctt 
ataggccaca 
gcatttatta 
agttttaaga 
tacataagga 
gttacggttt 
gaaaattgat 
atggaaaatg 
gacggtttgg 
tccattgttt 
aatttgtaaa 
cacccaggta 
cttcccccat 
aatttttagc 
tggcgtagta 
ttttttatgg 
ccagtccacc 
caataaacat 
ccagtaatgg 
aactgctttc 
ctcaccaaca 
tttttgccat 
tttgcatttt 
atgtcttttg 
tttttagctt 
atgcagttct 
ctttagttta 
cttagccaaa 



acaaatatct 
gtaggtacag 
agagcttgat 
cagacaatat 
ctaaattcac 
ttttgaggtc 
taggcaaatt 
tcttaccgtc 
tgttttctgt 
gtcagcataa 
ctagattggc 
ataacttgat 
gtgagcatag 
gtagtagtcc 
tccccactta 
ttaaggcctc 
ctgcgtagta 
attgatggac 
gaaatgcatg 
gatttcaggg 
tacaatgact 
tctgttattt 
tttattaata 
tctgatggtt 
agaagtgtct 
gttcaattgt 
gtagattgtc 
attaggtccc 
atttctttct 



83280 
83340 
83400 
83460 
83520 
83580 
83640 
83700 
83760 
83820 
83880 
83940 
84000 
84060 
84120 
84180 
84240 
84300 
84360 
84420 
84480 
84540 
84600 
84660 
84720 
84780 
84840 
84900 
84960 



54 



gaggctgatg 
cttacattta 
tccagtttca 
gagtcctttt 
gtgtggcttt 
ccagaattat 
tacagctttg 
atatgaattt 
aaataccatt 
caggcaagag 
cttcactggc 
agaactgata 
gtaccatttc 
tttacaatag 
cccagcactt 
ctgaccaaca 
ggtgggcgcc 
ggaggcatag 
agcaagaatc 
cctaggaata 
tgctaaaata 
gaagaatcaa 
ctgttttctt 
aattttcttc 
cctgttgcct 
aactcttctt 
ttcatcttct 
cactgggctc 



tcaagaagtg 
aatctttaat 
tttttctaca 
cctcattgct 
atttctcaat 
gctgtttggg 
ttctttttgc 
tagaatattt 
gaatctgtaa 
aaggaaacaa 
aatatgattc 
aatgatttta 
tatacgctaa 
ccacaaagaa 
tgggaggctg 
tggtgaaacc 
tgtaatccca 
gttgcagtga 
cgtttccaaa 
cggctgatga 
aatcagaggt 
tatcgttaaa 
ttgcacataa 
ttcatccttt 
ttatttctta 
tccttggtgg 
gttttatttc 
ttgtttcata 



tatttcctag 
ccaccttgag 
tatggctagc 
tgtttttgtc 
tttctgtcct 
ctgctgtgta 
ttaggattgc 
ttttctaatt 
attgctttgg 
aaggcatcca 
tatactttga 
ccccaagatt 
taatgtccag 
aatgaaataa 
aggcaggtgg 
ccgtctctac 
tctgctcggg 
gccgagatcg 
attaaaaaaa 
aggaggtgaa 
gacacaaata 
atgggaaatt 
attatgtgtt 
taaactctta 
ctgggttagt 
ttttttattc 
tttgagcatc 
gaaaacaatt 



ttttcttcta 
ttaatttttg 
cagtaatccc 
agtcttatca 
gttcctttgg 
gttcggttta 
tttgtctatt 
ttgtgaaaaa 
gcagtatagt 
aataggaaaa 
aaaccctaaa 
caggatacaa 
gctaagagtc 
tggctgggca 
atcacctgag 
taaaaataca 
aggctgaggc 
tgccactgca 
aaaaaagaaa 
agaatctgta 
aatggaaaaa 
tatcttaatt 
atgttgcgtc 
ggtctttttg 
atctaatctc 
attactttcc 
tgtatgttat 
ttttggcaag 



gcatttttat 
tatatggtga 
catgccattt 
aagatcagat 
tttgtgtgtc 
aagtcaggta 
tggactcttt 
caacattgat 
cattttacga 
gaagtcatca 
aattccgtca 
aatcagtgta 
aaatcaagaa 
cagtggctca 
gtcaggagtt 
aaaaattagc 
aggagaattg 
ctgcagcctg 
aaaaaaaaga 
tgaggagaag 
cactccatgc 
tttttcccta 
cttttgtttt 
ttttgttctg 
ctttctgcta 
tgacatctgt 
ttccataagc 
gacagaccac 



actttgagat 
aaagtaaggg 
attgaatagg 
ggttataggt 
tgtttttata 
acgtgatact 
ttttgcttcc 
agtttgatag 
ccagataaat 
tactctcact 
aaaggctact 
caaaaaatta 
cacagtccca 
tgcctgtaat 
cgagaccagc 
caggcgtggt 
cttgaaccca 
ggtgtgaaag 
caatggagta 
tgtaaagcac 
tcatggattg 
gttcattttt 
ccctatctgt 
ttttcccaag 
tatcttttgt 
tattctactt 
tcttacattt 
ttgaggccag 



85020 
85080 
85140 
85200 
85260 
85320 
85380 
85440 
85500 
85560 
85620 
85680 
85740 
85800 
85860 
85920 
85980 
86040 
86100 
86160 
86220 
86280 
86340 
86400 
86460 
86520 
86580 
86640 



55 



gagtttgaga 
ttagctgggt 
tgaacccagg 
ggacagagca 
gtgggtcaca 
cagaccagcc 
aaaattaaat 
tttagtttgc 
ttattgtgtt 
ttttttttaa 
aaaaggagtt 
tcttgagtag 
cctacctaat 
actaaaaaat 
tataaagata 
gctatcaatt 
gagggactgg 
tctgacagtc 
gtaatt ttca 
agtttgttga 
agattcaaga 
tattaaaaac 
aacatt aact 
tgcattccag 
ctttaaattc 
aacgggtata 
ttttagagca 
agctactttg 
agccagcaat 



ccagcctggc 
gtggtcacac 
aggcagaggt 
aggctctgtc 
cctgtaatcc 
tcgccaacgt 
ttttaaaaaa 
tctatggcaa 
ttgtttgctc 
atgagccctg 
ttcttggaaa 
tccatgtata 
aatgaagtat 
attggtcact 
attcgtctta 
agggtttata 
ccttattacc 
atgtccatga 
ctgccctagg 
tgaagtgtat 
agacagatct 
atatacatac 
agagttttaa 
tgccttgaat 
aaaattatat 
tgtgtcttta 
gaaacaagaa 
gaaaacagtt 
ttcactccta 



caacatagta 
acacctgtaa 
tgcagtgagc 
tcaaaaaaaa 
cagcactttg 
ggcaaaaccc 
ttaaattatt 
aatcttctca 
ttttttgggg 
tcctggtttt 
acctagtttt 
cactaaacta 
atgtgtatgt 
gcaagacatt 
gagtttcttt 
tggtctgcag 
ctttagggtg 
tctttagcgg 
cagttcacct 
gtttctgttt 
acttatatca 
ataaatgatt 
aagagtaaca 
attatatttt 
ttctaaaagt 
ccaaatgagg 
ctaccatctt 
tagtggtttc 
agaatttacc 



caaccctgtc 
tcccagctac 
caagacggcg 
aaagaaaaaa 
ggaggccaga 
catatctata 
ttatttcgtg 
tgcatgttct 
gcaggttttg 
tccttttttg 
ggaaggatat 
gcatgcagcc 
aaataagaga 
gtcttcccgc 
gttcattctc 
agccaggctt 
tgctatctct 
cagatccttt 
ttatgtattt 
ctattatcct 
gatattttgt 
aattctaaca 
aaaatatttt 
aaaagctttt 
gctggatttt 
aggtatgtat 
gacaatatta 
ttaaaaagat 
ccagagaaac 



tctgctaaaa 
tcaggaggct 
gcactgtact 
ttttttggtc 
tcacttgagg 
aaaaaaactt 
gaaagatact 
tcattttatc 
ttgttgttgt 
cttattattt 
tgtaggggaa 
tccctcactt 
tagggttgat 
cagactgtta 
accttccttg 
catatgcttg 
ttttgggagg 
gtgtgtcttc 
tagttccata 
tgctgatttt 
ttgttttaaa 
ataccgaatt 
tacattgata 
cctgtcattt 
tttttttagg 
ctgtgtttga 
agtcttccaa 
gtacataagc 
aaaaatgtat 



atacaaaaaa 
gataatcgct 
ctagcctggg 
aggcgtcatg 
ccgggaattc 
aaggataaaa 
tggtcataat 
tagtttatgt 
tttttttttt 
ttctttgaat 
gggataggta 
acagtgaagc 
ttattcttct 
gatgccttct 
ccattcctat 
tttaaatgtg 
aaaactgtac 
ctttgtcagt 
aatgttacct 
agtttttttc 
gcattactgc 
aaggatttta 
actaccaaaa 
gctgtttcag 
cgtttcatca 
attttttttg 
tccatggtac 
ttactgtcag 
gtccacacaa 



86700 
86760 
86820 
86880 
86940 
87000 
87060 
87120 
87180 
87240 
87300 
87360 
87420 
87480 
87540 
87600 
87660 
87720 
87780 
87840 
87900 
87960 
88020 
88080 
88140 
88200 
88260 
88320 
88380 



56 



agacttgtac 
taaaatgttt 
ccattataca 
attataagtg 
agaaaacgca 
ctcaggctgg 
agtgattctc 
tggctaattt 
atccgccctc 
cagaaatcga 
ggaatgagga 
gtggtgatga 
ggtgaaattc 
tttattgcta 
gtgtctcacg 
aggaattcaa 
gaaattaacc 
ggaggattgc 
ccagcctggg 
tctgaaatta 
aaaatctgtt 
ggtgagtaat 
ttttttagtt 
ttctctcatc 
gaatctaaat 
ctttttataa 
taggcaacat 
atacctgtag 



aagaattttt 
attttggtga 
tgctgcatac 
aaagaagcaa 
aatttatttt 
agtgcagtgg 
ctgcctcagc 
tttgtgtttt 
ctcagcctcc 
aaatttctag 
ttaactgcaa 
ttgcacatct 
atagtagatg 
tgtttttatt 
cctataatcc 
gaccagcctg 
aggcatggtg 
ttgaacccag 
taacagcaca 
gactgaatga 
atgtaaacaa 
ttggatttgg 
atttaaagta 
ctctacctcc 
ttcactggct 
atttattaaa 
ggtaaaaccc 
tcccagctac 



atagcagcaa 
ataaacaaat 
tcacttcata 
gatgatatat 
ttatttattt 
cgtgatctcg 
ctcctgagta 
tagtagagac 
caaagtgctg 
aaacgtaaag 
acaggtatga 
ataaatgtat 
aattcatacc 
caagtgtggt 
caacactttg 
ggcaacatag 
gcacgtgcct 
gatatcgagg 
agaccctgtc 
tcatttttta 
ggaagtccat 
tttatcttac 
atcttaagta 
tttgcctttt 
atgtcctttg 
acctgtaagt 
catctctaca 
ttgggaggtt 



tattaataat 
gtggtatatt 
tattaagttt 
gtcgcatatc 
atttttttga 
gctcactgca 
gctgggactg 
gggtttcact 
ggattacagg 
cagatcagtg 
gtgaactaaa 
taaaactcat 
tctataaaac 
gaattattgc 
ggaggctgag 
ggagaccctg 
ttggtcccag 
ctatagtgag 
tcaatataaa 
atatttttca 
tgacccaaaa 
agcttttatt 
tgaaatgagt 
tcttacctct 
caagacgtga 
ggtattaaag 
aaaaataaaa 
gatgtgggag 



agccagaact 
catttaatgg 
tatgaatgaa 
ataggattct 
gatggagtct 
accttcacct 
caggcatgtg 
gtgttagcca 
catgagtcac 
ggttgtcagg 
aagtgttcta 
tgaattatat 
tggttttttg 
tatgttttta 
gtgggaggat 
tctctacaaa 
ttactcggga 
ctatgattgt 
aaaagaaaag 
gacaagacta 
agaactagca 
tattttttgt 
aattcattga 
tgttcttata 
tctaatgatg 
taatttaaac 
tgagtcagga 
gatcgcttca 



acaaatgatc 
atactgttat 
actccaaaac 
gtttatatgc 
tgctctgtcg 
cctgggttca 
ccaccaggcc 
ggatggtgtg 
cgtgcccggc 
gtgagagttg 
aaactggatt 
acttacaatg 
cagcaaaata 
tccaggtgtg 
cgcttgagcc 
aactttttta 
ggctgagatg 
gccctgcact 
aatcattaat 
cttttgctta 
caacttgcta 
aaataattct 
tcagaagact 
tatatatgtg 
atagtatatt 
atttacacct 
atggtggcac 
gcccgggagg 



88440 
88500 
88560 
88620 
88680 
88740 
88800 
88860 
88920 
88980 
89040 
89100 
89160 
89220 
89280 
89340 
89400 
89460 
89520 
89580 
89640 
89700 
89760 
89820 
89880 
89940 
90000 
90060 



57 



tggaggttga 
cacactaaca 
ctcttttaat 
aatcttaaac 
ttttattccc 
gaataatttt 
ggaagtctct 
attgtccctt 
aattgtgttt 
atgtttacaa 
aatgattaaa 
ttaaaatcaa 
attatatgga 
gtggctcatg 
caggagttcg 
aaaaattagc 
aggagaatcg 
gcctgggcaa 
gtttcttcct 
tggcaaaaga 
agaactattt 
tttttttttc 
aatcatagtt 
cccagtagct 
cttgtagaga 
gagcagcctg 
ccttaaaacc 
gacccctgtt 
aaaagaaaac 



gcaacagagt 
cagatatttt 
aattttaatt 
tttatgaaca 
tgaaaaatgt 
aacaagaact 
tctactccag 
atcccatcaa 
atgtgtcagt 
tttaatgaag 
atttgaggca 
accctcataa 
gattttaaat 
cctgtaatct 
agaccagcct 
caggtgcggt 
ctccaggagg 
caagagcaaa 
tcccttcctc 
tgcagcactg 
attataccac 
ttttggagac 
cactgcagcc 
gggactacag 
cagggtttca 
ccttggcctc 
attcttagct 
ctaggcaaga 
tttaaacaaa 



gagaccctgt 
tatataaact 
atttgataat 
gtgtgaagta 
cataaagtaa 
ctgagttccc 
acctttctgt 
gagagtgttc 
tggccaccag 
cctctggaga 
ttactacatt 
ctatgggttt 
ctgcagagat 
cactactttg 
gaccaacatg 
ggcgcatgcc 
tggagatcac 
actccatctc 
agaatgactg 
ggtcctatcc 
cttagaagtt 
agggtctcac 
tcgaaccccc 
gcatgcacca 
ccatgttgcc 
ccaaagtgct 
cacagatcat 
ggtttcattt 
aaagaataat 



ctcaaaaaaa 
tgttcttggc 
tttaataatt 
gaaactgagt 
tataaagtat 
agatacctat 
ctatcccctt 
tttgccaaga 
tagtttgtag 
tagtatgcct 
tgttttcagc 
gctgacaaag 
gaaaataatt 
ggaggctgtg 
gcgaaacgcc 
tataatccca 
agtgagctga 
aaaaaaaagc 
atggatactc 
gaggtaggta 
taagaagtcc 
tctgttgccc 
caggctcagg 
tcatgcccag 
caagctggtc 
gggattacag 
acaaaaacag 
cttgatagta 
gtaaggaaaa 



attttaaaca 
atatgctgaa 
ttaatttcag 
ataggagatg 
tcaatgagta 
gacaaaagaa 
atttaaaaaa 
taaacgctaa 
atgttacctt 
taatctaaag 
aaatgggcct 
ggaaactagt 
tgctgtttca 
gcgggtggat 
atctatacta 
gctacctgag 
gatcatgcca 
gggaggggaa 
aggaagtgac 
tacaagagct 
aaaaaaatct 
aggctgtagt 
tgattctccc 
ataattattt 
ttgaactccc 
gcatgagcca 
tctatgagcc 
agtaagcagc 
tgttcttatc 



tttcatattg 
tacttttcag 
ctatttgaat 
cgctcatgaa 
tgtgagtatt 
caaattactt 
atgtacattg 
ttagagaagt 
tccaggtgac 
gaatggttct 
ttctattccc 
ttttactctt 
gctgggtgcg 
cacctgaggt 
aatacaaaaa 
aggctgaggt 
ttgcactcca 
ataatttgct 
ctaacagctt 
taaaacattt 
accaagagat 
gcagtggcac 
acccagcctt 
tattttattt 
gggctcaagt 
ccatgcccag 
agtagactgt 
agaaaactca 
tgttttctgg 



90120 
90180 
90240 
90300 
90360 
90420 
90480 
90540 
90600 
90660 
90720 
90780 
90840 
90900 
90960 
91020 
91080 
91140 
91200 
91260 
91320 
91380 
91440 
91500 
91560 
91620 
91680 
91740 
91800 



58 



ctcttttcag 
aagttttcac 
tttttattca 
ttaggataaa 
tggttgatgc 
agctccatcc 
gttatcatag 
ttactagtta 
tgctactttg 
aaattttctt 
aatgcataaa 
tgggtttaaa 
acacctataa 
ttgagactag 
accaggcccg 
cacttgagcc 
agcctgggtg 
ggtggctcac 
aggagttcga 
attagccagg 
gtatcgcttg 
gagcaagact 
tgcagagtaa 
agttcccttt 
tccagttatt 
aaagttaaca 
atcattgtac 
tgcctgactt 



tattctgaca 
ttttttaatt 
ggttacttgg 
ctcacacaga 
tctcaggtat 
ctttctctca 
tgtctactgt 
aaaattacca 
caaaataaaa 
gctgctggag 
tgagttgtat 
aaaagattcg 
tcccagcact 
actgggcaac 
ttggcatgag 
tggaaggtgc 
ggtgacatag 
gcctgtaatc 
gaccagcttg 
cgtggttgtg 
aacctgggag 
ctgtctcaaa 
aaacaaaaaa 
caaatttgtg 
ccaaagaagg 
tgtgtctctt 
ttggttttgc 
ttatgtttta 



tcatttgaaa 
atatcttttg 
ttgattctga 
tattttcagc 
gcactcagat 
ttcaagaaag 
aatttgcatt 
gtaacacctc 
aatgtaatct 
cttaaatctt 
tttcttagaa 
catccattaa 
ttgggaggcc 
atagggagat 
cctgtaatcc 
aggcagcagt 
tgaggtgctg 
ccagcacttt 
gccaatatgg 
cgcacctgta 
gtgaggttgc 
aaaaaaaaaa 
agactaatgc 
aactattgtt 
aatattctct 
ttttttaata 
ccttcaacaa 
cagaactaaa 



ttaatgtgtc 
agggattgca 
aataccacga 
tacatttcca 
tattagctat 
tacaattgaa 
aaaagcctac 
ccgtagtgaa 
tgagagtata 
gttcagttag 
cgttttggct 
agtacagaaa 
aaggtgggag 
cctgtctcta 
cagctactcg 
gagccatggt 
tcttaaaaag 
gggaggccaa 
tgaaaccgca 
atcccagcta 
agtaagccga 
aaaaaaaaaa 
attttgtaaa 
tttgggcagt 
tctcagcatt 
taatgatttg 
tttcaactgc 
accagaacag 



taaaggaaat 
gcaaaatatt 
ccaattcttt 
cagccagcat 
gatgataaag 
tgtagttgta 
attatacaaa 
atagggtgct 
ttttgaaact 
tggatttaga 
attctaaggt 
atggccgggt 
gatcgtgtgg 
caaaaaaata 
ggagtctgaa 
catgccactc 
taaataaata 
gatgggtgga 
tctctactga 
ctcaggaggc 
gcactgcagc 
aaaaaagtaa 
gaacaagttg 
atgcaagaaa 
tataaattgt 
tactgaatag 
aaaatgtatg 
gtgaagaata 



tctataagag 
tattgctatg 
ttaggttagg 
cggtagtgga 
taatcataat 
cgagaaagat 
cctttttgtt 
gattaagaac 
ctggatgaac 
aacagtagct 
agacaaaatt 
gcagtggctc 
gcctaatagt 
taataaatta 
gtaggaggat 
actgcactcc 
ggccaggcac 
tcatgaggtc 
aaatacaaaa 
tgaggcagga 
ctgggcaaca 
atacataaag 
cattctttta 
ttgaacactt 
atttgctctc 
atacatgtag 
tatttttaag 
tgtctgccag 



91860 
91920 
91980 
92040 
92100 
92160 
92220 
92280 
92340 
92400 
92460 
92520 
92580 
92640 
92700 
92760 
92820 
92880 
92940 
93000 
93060 
93120 
93180 
93240 
93300 
93360 
93420 
93480 



59 



tgaggtatag 
gacatacata 
cagtctggca 
gcctcagcct 
ttagtagaga 
gatccaccca 
ctgttttgtt 
agactggagt 
cattctcctg 
ctaatttttt 
tctcctgatc 
gccactgcgc 
caggctggag 
cagtcctctt 
actaattttt 
actcttggcc 
tttaatggat 
atatggaaac 
aaagtcattt 
ggtgcggtgg 
tgaggtcagg 
tacaaaaaaa 
cgaggcgggc 
ctgtctccac 
cagctacttg 
gagccgagat 
aaaaaaaaaa 
gaggcaggag 
ttgcactcta 



tattttacaa 
tatgaatgtg 
tgatatcggc 
cccaagtagc 
gggcatttca 
actcggcctc 
tgtttgtttg 
gcagtggtgc 
cctcagcctc 
gtacttttag 
tcgtgatctg 
ccggcctttt 
tgcagtggtg 
gtctcaaact 
taattttctg 
tcaagtaatc 
ttttctagtt 
ctatgaaata 
tctatcaaat 
ctcacgcctg 
agttcaagac 
ttaggccggg 
ggatcacgag 
taaaaaatac 
ggaggctgag 
cgtgccactg 
atagctgggc 
aatcacttga 
gcctgggcga 



tgatattttc 
tgtgtgtgtg 
tcactgcaac 
tacaggcatg 
ccatgttggc 
ccaaagtgct 
tttgtttgtt 
gatctcagct 
cctagtagct 
tagagacggg 
cctgcctcag 
tttttttttt 
caatcgtggc 
ttttagtagc 
tagagatggt 
ctcctgcctc 
gattagaagc 
gtcaaaattt 
catattacat 
taatcccagc 
cagcctggcc 
ggcggtggct 
gtcaggagat 
aaaaaaatta 
gcaggagaat 
cactccagcc 
atggtggcac 
acttgggagg 
cagagtgaga 



tttgtcttct 
gttttttttt 
ctccacctcc 
taccaccaag 
cagtctggcc 
gggattacag 
tgtttgagac 
cactgcaagc 
gggactacag 
gtttcaccgt 
cctcccaaag 
tttttgagac 
tgactgtagc 
tgggactaca 
gtctctccct 
agtctctcag 
tagaaaaaat 
tttttggtaa 
aaataccaaa 
actttgggag 
aacatggtga 
cacacctgta 
cgagaccatg 
ccggggcgtg 
ggcgtgaacc 
tgggcgacag 
gtgcctgtgg 
cagaggttgc 
cccagtctca 



atattgtaag 
tttttttttt 
tggattcaag 
cccagctaat 
tcagactcct 
gcttgagccg 
ggagt ctcac 
tctgcctccc 
gcgcccacca 
gttagccagg 
tgctgggatt 
acagggtctt 
cttgacctct 
ggcaagtgcc 
gttgcccagg 
agtgcttggg 
taactttgct 
tagacaaaac 
cagaagaaaa 
gccgaggtgg 
aaccctgtct 
atcccagcac 
ctggctaaca 
gtggcgggca 
tgggaagtgg 
agcaagactc 
tcctggctac 
agtgagccaa 
aaaaaaaaaa 



acatatataa 
ttttttgaga 
ctattctcct 
ttttgtattt 
gacctcaagt 
ctgcatctgg 
tctgtcgccc 
gggttcacgc 
ccacgcctgg 
atggtctctg 
acaggcgtga 
gctctgttct 
tggtctcaag 
acaacgcccc 
gtggtctcag 
actaattttt 
tcatttcaaa 
atatatacat 
ttacaggccg 
gcagatcact 
ctatcaaaaa 
tttgggaggc 
tggtgaaacc 
cctgtagtcc 
agcttgcagt 
catctcaaaa 
ttgggaagct 
gattgcccca 
aattacatga 



93540 
93600 
93660 
93720 
93780 
93840 
93900 
93960 
94020 
94080 
94140 
94200 
94260 
94320 
94380 
94440 
94500 
94560 
94620 
94680 
94740 
94800 
94860 
94920 
94980 
95040 
95100 
95160 
95220 



60 



aacttgtggg 
gttctatttt 
actctggctc 
cacctcccag 
atgggccacc 
cactatgttg 
ccctaagtgc 
taatatggaa 
aaatataaaa 
cttgaattac 
cagatcaaaa 
ttgttattat 
taggtgttat 
tgcaaatact 
gggggaacca 
tttccctaat 
attcaagtac 
atgaggcacc 
tttttttttt 
tgatctcagc 
cccgagtagc 
tttttttttg 
ctcctgacct 
gccaccacgc 
tttattttta 
tggtacaatc 
agccttttga 
ttttgtagag 



ggtttataca 
gtttttcaat 
tgtcacccag 
gttcaagcga 
acgcccagct 
gccaggtggg 
aaggattaca 
gacctcctta 
ttgggtatgc 
agtcagcctt 
atacttgggg 
tccctaaaca 
aagtaatcta 
gcaccatttt 
atttccgaga 
tgcagcagct 
attttggagt 
atacttgaat 
gtgagatgga 
tcactgcaac 
tgggactaca 
tatttgtagt 
tgggatccgc 
ccggctgaaa 
ttttattttt 
atggctcact 
gttgctggga 
atggggtctc 



ataaaaataa 
aacctttatt 
gctgcagtgc 
ttctcctgcc 
aatttttttt 
tctcgaactc 
ggcgtgagcc 
tgagatgctg 
agaatggtca 
ttctatccgt 
gaaaaaagca 
atatagtata 
gagatgattt 
acataaggga 
gatactgagg 
gttgagggga 
ttgtttgttt 
tttttttttt 
gtcttgctct 
ctccgcctac 
ggtgcacgcc 
agagacagag 
ccaccttggc 
gcttttagtt 
tgggacagtg 
gcaccctcta 
ctacgggcat 
actatgttgc 



cctataacta 
tttttaatta 
agtggcacca 
tcagcctcct 
ttttttgtat 
ctgacctcag 
accatgcctg 
aagcatttca 
taactctgtg 
ggattctgca 
tctgtgttga 
aaaactattt 
aaagtataca 
ctttaagcgt 
gacagctgta 
cagtgaactg 
ttttttttca 
aagaaagctt 
gtttctaggc 
tgggttcaag 
accacgccca 
tttcaccatg 
ctcccaaagt 
ttctaactta 
tctcacttgg 
gctcctgggc 
gtaccaccac 
ctaggctggt 



tacgttaatc 
atttcttttt 
tcttggctca 
gcgtagctga 
ttttagtaga 
gcgatcgcca 
gccaatttat 
tttgaaaaaa 
agcaaaattt 
ttcatggatt 
acatgtacag 
acattgcatt 
ggaggattgt 
ttgcggattt 
tatttgtaac 
ttaacacaga 
ggcttttatt 
ttagttttct 
tggagtggag 
tgattctcct 
gctaattttt 
ttggccagga 
gctagggatt 
tttaatttaa 
ttgcccaggc 
tcaggcaatc 
actcagctaa 
ctcagattcc 



ctttaataat 
tttgagaagg 
ctgcagcctc 
gattacaggc 
gatggggttt 
gcctcggcct 
taccgtttct 
gttgcatgta 
tgaaatcagg 
caaccaactg 
acttttattc 
tacattgtat 
gtaggttgta 
tgctatctat 
ttatttttta 
taacaagtgt 
cgcctgtcgg 
ttctttcttt 
tgcagtggtg 
gcctcagcct 
tttttttttt 
tggtctcgat 
acaggcatga 
tttaatttaa 
tggagtgcag 
ctcctgcctc 
atttttaatt 
tcaagcattc 



95280 
95340 
95400 
95460 
95520 
95580 
95640 
95700 
95760 
95820 
95880 
95940 
96000 
96060 
96120 
96180 
96240 
96300 
96360 
96420 
96480 
96540 
96600 
96660 
96720 
96780 
96840 
96900 



61 



ctcccacttg 
atttttaatc 
tatttattaa 
atttacaagg 
aggttatatt 
gctgttgttg 
cctgggttca 
ccactatgcc 
ctgatctgga 
ttgcaggtgt 
taatctagtg 
atgagttaat 
tccattaaat 
gtaatccatt 
taaaactgta 
tattgtcagt 
cacagtggct 
aggtcttgag 
caaaaattag 
aaggagaatt 
cactccagtc 
caacttatct 
atttttaagg 
ttaagtacct 
aacttttcta 
tatacattag 
ctatagactt 
tggttgttca 
ctcaggaatt 



cacctcccaa 
ccacacctaa 
aaatttagat 
aagaatatta 
gttacaatat 
cccaggctgg 
agtgattctc 
tggctaattt 
actcccaacc 
gagccaccat 
gttttggtat 
atttataaag 
aactttaaaa 
ctctcatttc 
tactagaatt 
taccggtgaa 
cacgcctgta 
ttcgagacca 
ccgggcgtgg 
gcttgaacct 
tgtgcgacag 
tttacagttt 
ttagtactaa 
tgtgtatcta 
cttatttccc 
tctcttacgt 
aaagtacaat 
tgcctataat 
cgagaccagc 



aatgctggga 
aaatataatt 
ggtaaaacta 
ctggtccctt 
ttggttctac 
agtgcaatgg 
ctgcctcagc 
tgtgtttttg 
tcaggagatc 
gcctggctct 
taataatttg 
cacttaaagt 
aatataaaac 
catttatggg 
ttatataccc 
tatatataaa 
atcctaccac 
gcctggccaa 
tggcacatgc 
ggggaggtgg 
gagcgagact 
ataatagtag 
aattgttggt 
atttaacttt 
cccttaactg 
taaaatatca 
tacatcaaca 
cccagcactt 
ctggataaca 



ttacaggtgt 
ttatccacca 
aaaattaaag 
ttgtgagcat 
tgtatacttt 
tgcaatctcg 
ctcctgagta 
tagagacggg 
cgcccacctc 
actatatact 
ataatgacct 
agctggcacg 
tgatagtggc 
atgagacgta 
attatttgat 
tttaggtaag 
tttgggaggc 
cgtggcgaaa 
ctgtaatccc 
aggttgcagt 
ccatctccaa 
aagttcaaat 
tataaattgg 
aagtccttta 
aaccagctac 
tataagtttc 
tcttttaaaa 
tgggaggctc 
tggtgaaatc 



gacaccgtgc 
ttttttaaaa 
cttaataaaa 
cccattaatt 
tttttttgag 
gctcactgca 
gctgggatta 
gtttcactat 
ggcctcccaa 
ttcattcagt 
ttagctgtta 
tagtaaacac 
attttattat 
aacacaagta 
gcaactttaa 
gaaaacccaa 
cgaggcagtt 
ccccgtctcc 
agctgctcgg 
gagctgagat 
aaaaaaaaaa 
aattggtttg 
ggtacaatat 
ttattttggt 
catctgcctt 
atatatatac 
ccttaatttc 
aggccattgg 
ctcatctcta 



cagacttgaa 
gtcataacat 
ctactgagtg 
atatatattc 
actgagtttc 
acctccgcct 
caggcatgcg 
gttggtcagg 
agtgctggga 
tgtttctttt 
ttgcttactt 
tatgtaaaga 
agagattaag 
gtttgctctc 
taccaaagtg 
cttggtcgca 
ggatcacctg 
actaaaaata 
gaggctgagg 
cgtaccactg 
aagaaaaacc 
gaatttctgt 
actttgtttt 
ttgataagac 
tttcctgttg 
acatatcaaa 
tggccaggcg 
atcccttgag 
caaaaattac 



96960 
97020 
97080 
97140 
97200 
97260 
97320 
97380 
97440 
97500 
97560 
97620 
97680 
97740 
97800 
97860 
97920 
97980 
98040 
98100 
98160 
98220 
98280 
98340 
98400 
98460 
98520 
98580 
98640 



62 



aaaaagttag 
taagcctggg 
taacagtgag 
tagtaatatc 
ggaagaggca 
atggggtttt 
tatcactctg 
cctcccagac 
gcgccaccat 
ccaggctggt 
tgggattaac 
ttttggctaa 
gatatcaaag 
aaataccagt 
acaggtatac 
ttaatttttt 
tgcaatggca 
tgcctcagcc 
tagtggagac 
atccacccgc 
caaaattttt 
aaaaatactt 
ccaaaggcaa 
ggctgggcaa 
tcacctgagg 
aaaaatacaa 
gctgagacag 
ttgtgattct 



ctgggcgtgg 
aggcagaggt 
accctgtctc 
atgtttaata 
gtgtgtgtgt 
ctttggtggg 
tctcccaggc 
tcaagtgatc 
gcctagctaa 
ctctaactcc 
aggtgcgagc 
aatttttcag 
atgatatgaa 
tgttggagtt 
gctagaaatg 
tttttttttt 
cgatctcggc 
tcctgagtag 
ggggtttcac 
cttggcctcc 
attaattttt 
aggtgtaaat 
tttaaaagat 
agtggctcac 
ttaggagttc 
aaattagctg 
aagaattgct 
ttttttctct 



tggcgcacac 
tgcagtgagc 
aaaacaaaaa 
cccatgttac 
tacagttaaa 
gggttgctgg 
tgaaatacat 
ctcccacctc 
attttgtaca 
tgggctcaag 
cactgcagcc 
attagtttac 
gttagattgg 
ttgtttaaag 
atttttaaca 
tttttttgag 
tcacttacaa 
ctgggattac 
catgttggcc 
caaagtgcgg 
ctattgcctg 
atagatattc 
cagaaaataa 
gtctgtaatc 
aggaccaacc 
ggcatggagg 
tgaaccttgg 
ttgtctgtat 



ctatagtcct 
cgagatcatg 
tattaatttc 
attcattatt 
taggaaacaa 
gtttttgggg 
tggcatgctc 
agcttcccta 
ttttgtagag 
tgatccactt 
agcctatatt 
aagttacaag 
gtttttaaga 
ttctttaact 
caggtcattt 
agggagtctc 
cctccacctc 
aggcacctgc 
aggctggtct 
ggattacagg 
gactctgtga 
attaactcag 
gaccaaatta 
ccagcacttt 
tggccatcat 
catgtgcctg 
agggggaggt 
ttttgaactt 



agctacttgg 
ccactgcact 
tataacaaat 
cttctacact 
gtagggttca 
tttttttttt 
atggctcact 
gtagctgaga 
atgaggcttt 
gcttcagcct 
ttaactatat 
tgtaggtgat 
gtagttttta 
tcttattttt 
atgccaaact 
actcattgcc 
ccaggttcaa 
ctggctaatt 
cgaactgctg 
tgtgagccac 
acctatccat 
cattgtttta 
atataaaaat 
gggaggccaa 
ggcgaaaccc 
taatcccagc 
tgcatatctg 
ttctataaat 



gaggatctct 
ccagcctggg 
aaattttatt 
ttggccttta 
tatagtgctt 
tgagacaggg 
gcagcctcaa 
ctacaggcgt 
gccatgttgc 
cccaaagttc 
gtttttttct 
atctcatgga 
aaatacggat 
gggccaaaat 
gcattttgcc 
caagctggag 
gcgattctcc 
tttgtatttt 
acctcaggtg 
cgtgcttggc 
tttgcctttt 
atctatattt 
gcatacttta 
ggtgggcgga 
tgtctctact 
tactcgggag 
agtggtgaaa 
gattgtgttt 



98700 
98760 
98820 
98880 
98940 
99000 
99060 
99120 
99180 
99240 
99300 
99360 
99420 
99480 
99540 
99600 
99660 
99720 
99780 
99840 
99900 
99960 
100020 
100080 
100140 
100200 
100260 
100320 



63 



tgtttttata 
agtaataaat 
aaatatattt 
cattaagaaa 
ttgatcattt 
gacttcactg 
tacatacgtt 
ctgcagaaca 
agaaacagcc 
agatgaacca 
taaggccttg 
acctgattct 
gttgcagatg 
tttgtcttct 
gtcttaacta 
tataaattca 
attttgagag 
ccttttattt 
agcaaatatg 
agatcagtct 
tgtagactgt 
aagagagcca 
gaactaagac 
gcaagtggaa 
aaaaactgaa 
ctcaaaagtt 
ttccatttgt 
agagactcat 
gtttactggt 



ttggaaaaat 
atattagtat 
aaatggctga 
cagcagcatc 
gtattgtcat 
aatccttgaa 
ggaacaagga 
ttttacttaa 
taagtttaca 
gaaaacagac 
ccttgatggt 
ggtcttcttt 
aaagtattcc 
gatgtttttt 
caggtgattg 
gtgtgccaaa 
tttcttcttc 
ttctattacc 
gtaaaataga 
tcatgtgacc 
taacttcttc 
attttaactg 
atttctaatt 
ctaccatctt 
tttttataca 
ctagttctag 
ttgtataaat 
atcatggcct 
taaaagtttg 



attatgcttt 
agcatttatt 
cataattttc 
attactttaa 
gtgcttttta 
aaaaataaaa 
ctttggagat 
aagaggaaac 
ggacttttta 
ttaaacaaaa 
cacagttatc 
accaatataa 
aggaacagtg 
cttaaaatag 
gaatgccaaa 
tgaaactttt 
ataaatctac 
ttgctaccaa 
gaaggtttga 
tgcagtattt 
ctgatggaat 
ctgtgaaaat 
tattgcttat 
ttattcttaa 
tggcatacat 
tctgttgatc 
atgcctggat 
tttaaatatt 
tttacagaac 



caaatgttaa 
aaggtttctt 
taagaataca 
tccatcattt 
aaaatctaga 
cgcagcgtca 
accactgttt 
acaagatctt 
gagtcttaca 
tatacaatgc 
ccaatggaca 
tcataatgta 
aatggtagaa 
taatttctcc 
cactcttaag 
ttcctaagta 
agacattaaa 
acagtttaga 
aggtttgagt 
ttttttctaa 
ttattttctg 
gtttccagtg 
tactttctta 
taattattaa 
ttttctagtt 
tgccttttgt 
tttcattata 
gtaataaagg 
ttttctctgg 



tacctatgaa 
gtgtagcaga 
tacacgtata 
cgttaaccac 
tgagaaatat 
gccctcaaac 
aaggaaatac 
caatgaacgt 
tatttgtgca 
aaatgtaatt 
ctaagttaga 
aataataatt 
gacacaagaa 
tacttttctt 
tttattttct 
actgtaatag 
caattgttgt 
tagcaatata 
tactctgtca 
tgtatttgtc 
caagaattat 
caagagaagg 
attttacagg 
tcccttcaat 
ccttctgctt 
tctcccaaaa 
aaaatgtcat 
caaatagata 
tgcttaaatg 



actaaacaca 
tcaacataga 
ttttttataa 
catatacctg 
tcgattatct 
tttagaagcg 
ctttgtaaac 
catcggctac 
ccaaacttga 
ttttgttgtt 
gcacaacaaa 
tgtatattgt 
catttgtttg 
ttctactgtt 
tttttcgttt 
gaaaaagttt 
gttcttttta 
atagcaaaaa 
tataacatgt 
agaaatctgt 
tctgatattt 
gaaatactag 
ataattataa 
gaaactttaa 
gctttattaa 
tgtacagtaa 
tgtagggagt 
tttgccctta 
atgctatgta 



100380 
100440 
100500 
100560 
100620 
100680 
100740 
100800 
100860 
100920 
100980 
101040 
101100 
101160 
101220 
101280 
101340 
101400 
101460 
101520 
101580 
101640 
101700 
101760 
101820 
101880 
101940 
102000 
102060 



64 



aaatgtcatg 
tttcttaggt 
ttataatgaa 
aatatagaag 
gttgcccagg 
ttcaagcaat 
ccccagctaa 
ttgaactcct 
gcataagcca 
agccataagg 
atgagagttg 
taggtgatgg 
gataacgaaa 
gacatttcag 
tacttggata 
tattgtctcc 
aagatttcaa 
gaaaatcatt 
gagaactatt 
gaagggccat 
tcaataaaga 
ttagattcgg 
agcattagtg 
tgatatattg 
ggtaaaagct 
tatgtttgta 
atatacagcc 
ttttggtaaa 



agtggaaaga 
tttgaaagaa 
gacatacatt 
atgcatgatt 
ctggagtgca 
tctcctgcct 
ttttgtattt 
gacctcaggt 
ctgcgcccag 
taaatcatgt 
agataaatag 
gtagtatccc 
ggtagtaatg 
ttaagctcat 
actggctaat 
tttaaaacta 
atgttaaaag 
actgtataga 
ttctatgact 
gaaaatagag 
tagaagttgc 
tgttgagctc 
tgctcttcat 
gactttgagc 
tattctaaga 
ctccagaaga 
gtctttgttt 
atgaaccatt 



atatttgtag 
tacattaaaa 
cttcttaatt 
tctgggtttt 
atggcgcaat 
cagcctcccg 
ttagtagaga 
gatccgcctg 
ccagaagatg 
ctcttccaat 
gggaaaaaaa 
tttaaggtct 
aaatatatat 
aaaatttcat 
catattaaag 
tcatggttat 
agataaaagt 
agttgctttc 
taactctaac 
taatgatata 
tgaagttttc 
tgtgttgtat 
gttaatatgg 
caagggaaag 
cagtctgtcc 
attagaggaa 
atagtgtaga 
tacagttcgg 



tagtaacaag 
taaaaaactt 
ttactcttgc 
tttttttttt 
ctcgactcac 
agtagctggg 
tggggtttct 
cctcggcctc 
catgatttct 
catgactttg 
atttttttca 
caaacattac 
gatgaaaaga 
tgttttcatt 
gactatgtgg 
aattctattg 
caggttaata 
ctgatcaagt 
caagttttat 
gtaggagata 
tgaattaata 
tacttcctaa 
cagagttttg 
aatgagtact 
attgagaata 
aagcagatac 
attctttata 
ttttggactc 



aatttttcat 

gcccctacta 
tcttgttaaa 
tgagacagag 
cacaacctcc 
attacaggca 
ccatgttggt 
ccaaagtgct 
taggatcata 
gaactccctg 
agccagagct 
aacatcaatt 
attgagaagt 
taaaagatta 
ttccagctca 
ggaaagactt 
ctatcttaaa 
ctgaacttca 
tttaagctgt 
agggattggt 
atgacttaga 
aagataatgc 
taaactaaat 
atctttccag 
ttagatttct 
tagaattcta 
ttttgtacaa 
tgagtcaaag 



ttaggaaaga 
ggtaagaact 
gatttgtttg 
tttcgctctt 
gcctcccagg 
tgcgccacta 
caggctggtc 
gggattacag 
tgctgtttgt 
aataataaaa 
atgcatatgt 
atgaaatact 
tctaaattaa 
acgttattga 
acttttaata 
ttagataaca 
cactgagtca 
gctagtgcta 
ttctttgata 
ttggtctttt 
ttgtgacctt 
ttaaacatta 
taaaacttac 
atatcttaag 
gacttgcaaa 
atttaattac 
aaactaattc 
gattttcctt 



102120 
102180 
102240 
102300 
102360 
102420 
102480 
102540 
102600 
102660 
102720 
102780 
102840 
102900 
102960 
103020 
103080 
103140 
103200 
103260 
103320 
103380 
103440 
103500 
103560 
103620 
103680 
103740 



65 



taaatgcttg 
aagggtacag 
ttattttgtc 
taatgaagga 
agaaagcaat 
ctataatgga 
gggagaatcg 
ttctgccaaa 
tacacctgta 
gaggaggttg 
tgtctcaaaa 
tttccagttt 
ggctgaggaa 
ggagagtgcc 
tggattccaa 
ttaaagtcta 
aagttccact 
cacgccttaa 
caaaatggaa 
atccattagt 
tcatcacagc 
aaattatgtg 
tactcaacag 
aatagcgtta 
agtttcagca 
ccgtaatcac 
gaccagcctg 
atgggtgatg 
aacctgggag 



tctcaatttt 
ttgcataaag 
tttacttttt 
gttgtgtgta 
gattgggtgg 
agagattcag 
cttacactca 
aaaaaaaaaa 
gtcccggtta 
cagtgagcca 
aataaaaagt 
gtgcagttca 
tcatttggga 
tcagaaagta 
aattgtgctg 
aaaaactcag 
aagaaggact 
ggaagaatga 
ataaacttac 
aatttaaaac 
atatatccaa 
atccttttga 
aaaatataca 
aaatatgtta 
gaaaaataag 
agcactttgg 
accaacatgg 
catgcccata 
gtagaggttg 



agtctggtct 
tgggttttta 
tactttaaaa 
ttggactctt 
ctctgcagag 
gctgggtgtg 
ggaattcaag 
aaaaaaaaac 
ctcaagaggc 
agatcacacc 
aaacgagatt 
cttggatata 
cttctagaac 
agctctaaaa 
gtgcagtgtg 
cagcgattgc 
gagtaaatac 
cagcatccta 
tgtaatgaag 
cagaacaaaa 
agctcagtac 
ctcaacttcc 
gaagcagatg 
aaagaattta 
aattgaaaac 
gaggccaagg 
tgaaacccgg 
atcctagcta 
cagtgaacca 



tttgtacttt 
tcctaatgta 
ctttttgata 
agtaacaatt 
caatcaaaac 
gtggctcatg 
accagcctgg 
taaactaaaa 
tgaggttaga 
accacactcc 
caccttaacc 
gaattcaggt 
acctgaaaat 
tctgcctaca 
attatttaaa 
agctgcccaa 
ttggtgtttc 
gaactaaagg 
cttaaaatca 
cacagcattc 
ataataaaca 
ctgatggaaa 
cacaagtgac 
caagaagaga 
ccaacagagg 
caggtggatc 
tctctactaa 
cttgggaggc 
agatcatgcc 



tcttcagaag 
ttggaaataa 
ttttaggggt 
ataaacgctt 
aaggtagaaa 
cctgtttggg 
ggaatatagg 
attagccagg 
ggatcgctta 
agcccaggtt 
agcttttacc 
agaatgtgac 
tagaagagaa 
tattcctttc 
ggaacccagg 
ggacagagag 
ccattgaacc 
ctgtgcttca 
aatctcacag 
ttgagaagaa 
gttactaagc 
aaaaaaagtg 
ctctatgaga 
taatgggtga 
ccaggcaccg 
acctaaggtc 
aaatacaaaa 
tgaagcagga 
attgcactcc 



aaatgaatta 
atgataaact 
tggagtctga 
aacaaaatat 
ctgcaaagtc 
aggccagtat 
gagaccctgt 
tttggtggcc 
agcccgggag 
acagagaacc 
catagggcaa 
agttttgaca 
attttgggaa 
aaatccttgg 
agaaagcaag 
tttgaaattc 
cccaaaatgc 
gcactaagga 
catcaaggtt 
aacaattcag 
atgcaaagaa 
gtccatacca 
gtgattttaa 
atagataggg 
tgcctcacgc 
aggagttcaa 
ctagccgggc 
gaattgcttg 
agcctgggca 



103800 
103860 
103920 
103980 
104040 
104100 
104160 
104220 
104280 
104340 
104400 
104460 
104520 
104580 
104640 
104700 
104760 
104820 
104880 
104940 
105000 
105060 
105120 
105180 
105240 
105300 
105360 
105420 
105480 
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agaagagcaa 
cgcctgtaat 
agaccatcct 
ggcatggtgg 
tgaccccagg 
caaccgtctc 
aagtcttgta 
gtaaactagg 
agagaaagaa 
atttgaagta 
cgtgaaattt 
atccccaagc 
aaaaaaaaaa 
tcagat aaac 
aatgacatct 
aaaatatcct 
aacact ttgg 
acctagtgag 
tgcact ccag 
ccagatgaaa 
tcttcaggct 
tttcct tgga 
ggaatt aaga 
tatttttcta 
tatatgtata 
gtaatt atat 
gtagat tgtg 
ctaaaaagcc 



aactccatct 
cccagcactt 
ggctaacatg 
cgggcacctg 
aggcggaact 
aaaaataata 
atgaaaacta 
cacaacagga 
aaaagaaata 
aagaaggaaa 
tccccaaagt 
agaataaata 
aaaatcttaa 
aagagtgatg 
ttaaggtgca 
tcaagatgtt 
ggggctgaga 
aaccccatca 
cctgggggac 
acagaatttg 
gaaggaaaat 
tggacttatc 
attttgaaat 
aatatcattg 
agtaaaatat 
tgagataata 
ataagaatgc 
attagagaag 



caaaaaataa 
tgggaggccg 
gtgaaacccc 
tagtcccagc 
tgcaatgagc 
ataataataa 
cagtatctga 
aagactgaaa 
gcagaggttt 
ggaagagaat 
gttgaaagac 
caacaaagac 
aagcaatcaa 
gctaacctca 
aagatgaaac 
aaggcaatgg 
caggagaatc 
aaaaataaaa 
agagtcccta 
ttgctggcag 
ataaatctgg 
agtaaatgga 
gataaacatg 
ctgtttaaag 
aggacagaaa 
ttactatatt 
attttgtaat 
ataaaatggg 



taataataat 
agacaggcag 
gtctctacta 
tactcgggag 
ccagatcgcg 
ttagaaaata 
aattaaaaat 
aagacaagta 
gagacctttg 
ggggcaagag 
atcaacgtac 
tacatctggg 
ggtggggagt 
tcagaaatga 
aaaggttaac 
ccaggctcag 
ccttgaggcc 
ttagctgggc 
tctcttgaaa 
actgacttta 
agtgaaaaac 
tttcccaatg 
ttggtaaata 
caaacaaaaa 
tagcatgaag 
gtacacaaag 
ctctcacccc 
tcactaagaa 



tggctgggcg 
atcacaaggt 
aaaatacaaa 
gctgaagcag 
ccactgcact 
ataaaaaata 
ttaaatggat 
agtatcaaac 
gaacattatc 
aaatctttga 
agatctaaga 
tacatcacat 
gggaaggagt 
tggaattaag 
ttaaaactct 
tggcttatgc 
atgagtgacc 
atgggctgtg 
aacttaagac 
tacggaatgc 
gtagtatctg 
ggaacctgta 
agaaagacat 
tagtattgca 
ggtcagagag 
tgagatagca 
atcactgggg 
ttatgtgatt 



cggtggctca 
caggagattg 
aaattagcct 
gagaatggca 
ccagcctggg 
acccactgga 
ggacttaaca 
caaagcagag 
agttcctgta 
agcaatgatg 
agttcactga 
tgccaaaaaa 
agcattacat 
aaaatgatgg 
gtatccagtt 
ctgtaatccc 
agcctgggca 
gtcgtgctac 
aaaatctttt 
tttaagaaat 
aaattaaaat 
tctccaggat 
ttttcttttt 
gtgtttataa 
ggcaataaat 
ttagtgtaag 
tgcagtagat 
agtcaaatga 



105540 
105600 
105660 
105720 
105780 
105840 
105900 
105960 
106020 
106080 
106140 
106200 
106260 
106320 
106380 
106440 
106500 
106560 
106620 
106680 
106740 
106800 
106860 
106920 
106980 
107040 
107100 
107160 
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aatcaaggaa 
gaaattaata 
tttttttcag 
cagcagtggc 
cctcctgcct 
ccagaactat 
atgattcgtt 
ttagagcaaa 
agaggaaacc 
ctccagcaac 
tgttagggaa 
aaccttgcca 
cagatcaatg 
agacctaata 
gacagattga 
caagcacaga 
gcacaccaaa 
atacagaaag 
agtgaattca 
cacgcatata 
tttgagacca 
ccaggtgtgg 
acttgaacct 
gggcaacaga 
acacgtgtgt 
agcagttatg 
ttcactgtct 
aacagaaatt 
gtttggtttc 



agaaaaacag 
atgtcactaa 
ggggttgagg 
gtaatcagtc 
cggcctccca 
tgttttaaag 
aggagcactc 
aggatatgaa 
aagcaccagc 
aagtgacagc 
tagagggaat 
gcaggacttt 
tgcttacatt 
gggtgaagac 
ggcagttatt 
aaggggtgtc 
gacttgatgg 

gggggaaact 

tggtttaaaa 
atcccagcac 
gcctggccaa 
tggtacacac 
gggaggcaga 
gcgagatcct 
atgtgtccaa 
ctccaataat 
tagtctgctt 
tatcctcaca 
tagtgagggc 



agcaaagaag 
attgcttacc 
ttttttttta 
tcaaaatccc 
aaatgctggg 
tcattaggtg 
acaggactca 
gcaaattcag 
ttctaagggt 
atgtgaaagt 
ctcccctcaa 
ctggggatag 
gggtagaggg 
agtgtctaca 
cacatggtgg 
catatggggt 
aataaagtga 
agaatgaaac 
tctataaaaa 
tttgggaggc 
cactgcgaaa 
ctgtagtccc 
ggttgcagtg 
gcctcaaata 
atatgtatat 
aatgagcaca 
gggctgcatt 
gttctagaag 
tttcttcctg 



tgtaagataa 
ttcctccttc 
gagacagagt 
tgcagtctca 
attacaggca 
gttaagccct 
gcatatagtc 
caaagggaaa 
tctctcctaa 
gttaagcctc 
attcaagttc 
cagtcccagg 
gacctatgga 
atgatggaca 
ataggggcaa 
aagggtgtca 
atatactaac 
ctggagttca 
tagatgaaat 
cgaggcgggc 
ccccatctct 
agctacttgg 
ggctgagatc 
aaaatagagg 
tccctagtct 
taaagtaccc 
acaaaatacc 
cttggaagtc 
gcttgcagat 



aacagtggga 
ctgaactgtt 
ctctattgcc 
aactcctggg 
ggcagagcca 
cgagactatg 
ttattcccaa 
aaaggaatta 
tgaagtcaca 
actgttagac 
ccagatgcct 
cacgtgtttg 
agtccaaatt 
gctaggtatg 
cctggaatga 
gcagagatgg 
cacagagaag 
gcttgaattg 
atcggctggg 
agatcgcttg 
agtaaaaata 
gaggctgagg 
gcaccactgc 
acatataagc 
gtccaccaag 
atatcttgcc 
atagactggg 
caagattaaa 
ggccaccttc 



tggaagagct 
gtggattttt 
ccaggctgga 
ctcaagtgat 
ccacaccctg 
tgcaagtttg 
caatgattta 
agtgaagtcc 
caggatgtca 
atagtaagct 
ggcaagggcc 
cacagtggtt 
tgggtgtcag 
aggtgtcagg 
ggaataagct 
tagattggtt 
gtaattataa 
ggtttaagaa 
cacagtggct 
agatcaggag 
caaaaattag 
cacgagaatc 
actccagcct 
atataaatat 
gtggccttgg 
ttccaaattc 
cagcttaaat 
gtaccagcca 
tcaccgtgtc 



107220 
107280 
107340 
107400 
107460 
107520 
107580 
107640 
107700 
107760 
107820 
107880 
107940 
108000 
108060 
108120 
108180 
108240 
108300 
108360 
108420 
108480 
108540 
108600 
108660 
108720 
108780 
108840 
108900 
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cttgtatggc 


agacagcaca 


agctctctgg 


tgtccctttt 


taaaagggca 


ttaatcccat 


108960 


catgacagtc 


ccatcctcat 


tatctcatct 


aaccctaggt 


acttcccaaa 


ggctgaatca 


109020 


ccaaagacca 


tcacattgct 


ggtgaaggct 


tcaacatatg 


aatttgaggg 


acacgaatat 


109080 


tcagtccata 


acatcaacta 


aaggaaccaa 


gactctttga 


taaaatggct 


aaattcaggg 


109140 


ctggggcaga 


gaaaatacat 


gagtgtggaa 


cttcttgtgc 


cagagagaaa 


aagtgcccaa 


109200 


agattgatga 


ggatgaatca 


ttgaaatgac 


acacagatta 


aaagggttcc 


cactggacaa 


109260 


atttgagcat 


caaaataagt 


aatagtagta 


attaattata 


acccatcaga 


agaaataaac 


109320 


catgagctca 


tgtgaatata 


tgaatacaaa 


cataaacaaa 


ttacaagcat 


aatgaggaat 


109380 


gtgatattta 


tatggtttaa 


aggtacctct 


ccaggccggg 


tgcagtaact 


ctcacctgta 


109440 


atcccagcac 


tttgggaggc 


caaggcaggt 


agatcacctg 


aggtcaggca 


tttgagacca 


109500 


gcctgcacaa 


catggtgaaa 


ccctgactct 


actaaaaata 


cataacgcga 


gccgggcgtg 


109560 


gtggcacgtg 


tctataatct 


gccactgatt 


aggtgtgtga 


ttttcccaag 


caggggataa 


109620 


tagtagtacc 


tatgtcaaag 


gctgttatga 


ggattaaatg 


agctaacaca 


taatcgtgct 


109680 


tttttttttt 


tttttttttt 


ttgagacaga 


gtcttgcact 


gtcgcctggg 


ctggagtgca 


109740 


atggcacgat 


ctcggcccac 


tgcaacctct 


gcctcccagg 


ttcaagtgat 


tctcctgcct 


109800 


cagcctcctg 


agtagctggg 


attacaggct 


cctgccacca 


cacctggcta 


ttttcaatag 


109860 


agacggggtt 


tcactatgtt 


ggccaggcta 


gtctcaaaaa 


cctgacctcg 


tgatccaccc 


109920 


gctttggcct 


cccaaagtgc 


tgggattaca 


ggcatgagcc 


actgcacccg 


gctttttttt 


109980 


tttttttttg 


agatggaatc 










110000 


<210> 2 

<211> 3263 

<212> DNA 

<213> Homo sapiens 












<400> 2 
gctcctgaga 


ccggcgggca 


cacgggggtc 


tgtggccccc 


gccgtagcag 


tggctgccgc 


60 


cgtcgcttgg 


ttcccgtcgg 


tctgcgggag 


gcgggttatg 


gcggcggcgg 


cagtgagagc 


120 


tgtgaatgaa 


ttctccgggt 


ggacgaggga 


agaagaaagg 


ctccggcggc 


gccagcaacc 


180 


cggtgcctcc 


caggcctccg 


cccccttgcc 


tggcccccgc 


ccctcccgcc 


gccgggccgg 


240 


cccctccgcc 


cgagtcgccg 


cataagcgga 


acctgtacta 


tttctcctac 


ccgctgtttg 


300 


taggcttcgc 


gctgctgcgt 


ttggtcgcct 


tccacctggg 


gctcctcttc 


gtgtggctct 


360 



gccagcgctt 


ctcccgcgcc 


ctcatggcag 


ccaagaggag 


ctccggggcc 


gcgccagcac 


420 


ctgcctcggc 


ctcggccccg 


gcgccggtgc 


cgggcggcga 


ggccgagcgc 


gtccgagtct 


480 


tccacaaaca 


ggccttcgag 


tacatctcca 


ttgccctgcg 


catcgatgag 


gatgagaaag 


540 


caggacagaa 


ggagcaagct 


gtggaatggt 


ataagaaagg 


tattgaagaa 


ctggaaaaag 


600 


gaatagctgt 


tatagttaca 


ggacaaggtg 


aacagtgtga 


aagagctaga 


cgccttcaag 


660 


ctaaaatgat 


gactaatttg 


gttatggcca 


aggaccgctt 


acaacttcta 


gagaagatgc 


720 


aaccagtttt 


gccattttcc 


aagtcacaaa 


cggacgtcta 


taatgacagt 


actaacttgg 


780 


catgccgcaa 


tggacatctc 


cagtcagaaa 


gtggagctgt 


tccaaaaaga 


aaagacccct 


840 


taacacacac 


tagtaattca 


ctgcctcgtt 


caaaaacagt 


tatgaaaact 


ggatctgcag 


900 


gcctttcagg 


ccaccataga 


gcacctagtt 


acagtggttt 


atccatggtt 


tctggagtga 


960 


aacagggatc 


tggtcctgct 


cctaccactc 


ataagggtac 


tccgaaaaca 


aataggacaa 


1020 


ataaaccttc 


tacccctaca 


actgctactc 


gtaagaaaaa 


agacttgaag 


aattttagga 


1080 


atgtggacag 


caaccttgct 


aaccttataa 


tgaatgaaat 


tgtggacaat 


ggaacagctg 


1140 


ttaaatttga 


tgatatagct 


ggtcaagact 


tggcaaaaca 


agcattgcaa 


gaaattgtta 


1200 


ttcttccttc 


tctgaggcct 


gagttgttca 


cagggcttag 


agctcctgcc 


agagggctgt 


1260 


tactctttgg 


tccacctggg 


aatgggaaga 


caatgctggc 


taaagcagta 


gctgcagaat 


1320 


cgaatgcaac 


cttctttaat 


ataagtgctg 


caagtttaac 


ttcaaaatac 


gtgggagaag 


1380 


gagagaaatt 


ggtgagggct 


ctttttgctg 


tggctcgaga 


acttcaacct 


tctataattt 


1440 


ttatagatga 


agttgatagc 


cttttgtgtg 


aaagaagaga 


aggggagcac 


gatgctagta 


1500 


gacgcctaaa 


aactgaattt 


ctaatagaat 


ttgatggtgt 


acagtctgct 


ggagatgaca 


1560 


gagtacttgt 


aatgggtgca 


actaataggc 


cacaagagct 


tgatgaggct 


gttctcaggc 


1620 


gtttcatcaa 


acgggtatat 


gtgtctttac 


caaatgagga 


gacaagacta 


cttttgctta 


1680 


aaaatctgtt 


atgtaaacaa 


ggaagtccat 


tgacccaaaa 


agaactagca 


caacttgcta 


1740 


gaatgactga 


tggatactca 


ggaagtgacc 


taacagcttt 


ggcaaaagat 


gcagcactgg 




gtcctatccg 


agaactaaaa 


ccagaacagg 


tgaagaatat 


gtctgccagt 


gagatgagaa 


I860 


atattcgatt 


atctgacttc 


actgaatcct 


tgaaaaaaat 


aaaacgcagc 


gtcagccctc 


1920 


aaactttaga 


agcgtacata 


cgttggaaca 


aggactttgg 


agataccact 


gtttaaggaa 


1980 


atacctttgt 


aaacctgcag 


aacattttac 


ttaaaagagg 


aaacacaaga 


tcttcaatga 


2040 



70 



acgtcatcgg 


ctacagaaac 


agcctaagtt 


tacaggactt 


tttagagtct 


tacatatttg 


2100 


tgcaccaaac 


ttgaagatga 


accagaaaac 


agacttaaac 


aaaatataca 


atgcaaatgt 


2160 


aattttttgt 


tgtttaaggc 


cttgccttga 


tggtcacagt 


tatcccaatg 


gacactaagt 


2220 


tagagcacaa 


caaaacctga 


ttctggtctt 


ctttaccaat 


ataatcataa 


tgtaaataat 


2280 


aatttgtata 


ttgtgttgca 


gatgaaagta 


ttccaggaac 


agtgaatggt 


agaagacaca 


2340 


agaacatttg 


tttgtttgtc 


ttctgatgtt 


ttttcttaaa 


atagtaattt 


ctcctacttt 


2400 


tcttttctac 


tgttgtctta 


actacaggtg 


attggaatgc 


caaacactct 


taagtttatt 


2460 


ttcttttttc 


gttttataaa 


ttcagtgtgc 


caaatgaaac 


ttttttccta 


agtaactgta 


2520 


ataggaaaaa 


gtttattttg 


agagtttctt 


cttcataaat 


ctacagacat 


taaacaattg 


2580 


ttgtgttctt 


tttacctttt 


atttttctat 


taccttgcta 


ccaaacagtt 


tagatagcaa 


2640 


tataatagca 


aaaaagcaaa 


tatggtaaaa 


tagagaaggt 


ttgaaggttt 


gagttactct 


2700 


gtcatataac 


atgtagatca 


gtcttcatgt 


gacctgcagt 


attttttttt 


ctaatgtatt 


2760 


tgtcagaaat 


ctgttgtaga 


ctgttaactt 


cttcctgatg 


gaatttattt 


tctgcaagaa 


2820 


ttattctgat 


atttaagaga 


gccaatttta 


actgctgtga 


aaatgtttcc 


agtgcaagag 


2880 


aagggaaata 


ctaggaacta 


agacatttct 


aatttattgc 


ttattacttt 


cttaatttta 


2940 


caggataatt 


ataagcaagt 


ggaactacca 


tcttttattc 


ttaataatta 


ttaatccctt 


3000 


caatgaaact 


ttaaaaaaac 


tgaattttta 


tacatggcat 


acatttttct 


agttccttct 


3060 


gcttgcttta 


ttaactcaaa 


agttctagtt 


ctagtctgtt 


gatctgcctt 


ttgttctccc 


3120 


aaaatgtaca 


gtaattccat 


ttgtttgtat 


aaatatgcct 


ggattttcat 


tataaaaatg 


3180 


tcattgtagg 


gagtagagac 


tcatatcatg 


gccttttaaa 


tattgtaata 


aaggcaaata 


3240 


gatatttgcc 


cttagtttac 


tgg 








3263 


<210> 3 
<211> 616 
<212> PRT 
<213> Homo 


sapiens 












<400> 3 















Met Asn Ser Pro Gly Gly Arg Gly Lys Lys Lys Gly Ser Gly Gly Ala 
15 10 15 

Ser Asn Pro Val Pro Pro Arg Pro Pro Pro Pro Cys Leu Ala Pro Ala 
20 25 30 

Pro Pro Ala Ala Gly Pro Ala Pro Pro Pro Glu Ser Pro His Lys Arg 
35 40 45 



71 



Asn Leu Tyr Tyr Phe Ser Tyr Pro Leu Phe Val Gly Phe Ala Leu Leu 
50 55 60 

Arg Leu Val Ala Phe His Leu Gly Leu Leu Phe Val Trp Leu Cys Gin 
65 70 75 80 

Arg Phe Ser Arg Ala Leu Met Ala Ala Lys Arg Ser Ser Gly Ala Ala 
85 90 95 

Pro Ala Pro Ala Ser Ala Ser Ala Pro Ala Pro Val Pro Gly Gly Glu 
100 105 110 

Ala Glu Arg Val Arg Val Phe His Lys Gin Ala Phe Glu Tyr lie Ser 
115 120 125 

lie Ala Leu Arg lie Asp Glu Asp Glu Lys Ala Gly Gin Lys Glu Gin 
130 135 140 

Ala Val Glu Trp Tyr Lys Lys Gly lie Glu Glu Leu Glu Lys Gly lie 
145 150 155 160 

Ala Val lie Val Thr Gly Gin Gly Glu Gin Cys Glu Arg Ala Arg Arg 
165 170 175 

Leu Gin Ala Lys Met Met Thr Asn Leu Val Met Ala Lys Asp Arg Leu 
180 185 190 

Gin Leu Leu Glu Lys Met Gin Pro Val Leu Pro Phe Ser Lys Ser Gin 
195 200 205 

Thr Asp Val Tyr Asn Asp Ser Thr Asn Leu Ala Cys Arg Asn Gly His 
210 215 220 

Leu Gin Ser Glu Ser Gly Ala Val Pro Lys Arg Lys Asp Pro Leu Thr 
225 230 235 240 

His Thr Ser Asn Ser Leu Pro Arg Ser Lys Thr Val Met Lys Thr Gly 
245 250 255 

Ser Ala Gly Leu Ser Gly His His Arg Ala Pro Ser Tyr Ser Gly Leu 
260 265 270 

Ser Met Val Ser Gly Val Lys Gin Gly Ser Gly Pro Ala Pro Thr Thr 
275 280 285 

His Lys Gly Thr Pro Lys Thr Asn Arg Thr Asn Lys Pro Ser Thr Pro 
290 295 300 

Thr Thr Ala Thr Arg Lys Lys Lys Asp Leu Lys Asn Phe Arg Asn Val 
305 310 315 320 

Asp Ser Asn Leu Ala Asn Leu lie Met Asn Glu lie Val Asp Asn Gly 
325 330 335 



Thr Ala Val Lys Phe Asp Asp lie Ala Gly Gin Asp Leu Ala Lys Gin 
340 345 350 
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Ala Leu Gin Glu lie Val lie Leu Pro Ser Leu Arg Pro Glu Leu Phe 
355 360 365 

Thr Gly Leu Arg Ala Pro Ala Arg Gly Leu Leu Leu Phe Gly Pro Pro 
370 375 380 

Gly Asn Gly Lys Thr Met Leu Ala Lys Ala Val Ala Ala Glu Ser Asn 
385 390 395 400 

Ala Thr Phe Phe Asn He Ser Ala Ala Ser Leu Thr Ser Lys Tyr Val 
405 410 415 

Gly Glu Gly Glu Lys Leu Val Arg Ala Leu Phe Ala Val Ala Arg Glu 
420 425 430 

Leu Gin Pro Ser He He Phe He Asp Glu Val Asp Ser Leu Leu Cys 
435 440 445 

Glu Arg Arg Glu Gly Glu His Asp Ala Ser Arg Arg Leu Lys Thr Glu 
450 455 460 

Phe Leu He Glu Phe Asp Gly Val Gin Ser Ala Gly Asp Asp Arg Val 
465 470 475 480 

Leu Val Met Gly Ala Thr Asn Arg Pro Gin Glu Leu Asp Glu Ala Val 
485 490 495 

Leu Arg Arg Phe He Lys Arg Val Tyr Val Ser Leu Pro Asn Glu Glu 
500 505 510 

Thr Arg Leu Leu Leu Leu Lys Asn Leu Leu Cys Lys Gin Gly Ser Pro 
515 520 525 

Leu Thr Gin Lys Glu Leu Ala Gin Leu Ala Arg Met Thr Asp Gly Tyr 
530 535 540 

Ser Gly Ser Asp Leu Thr Ala Leu Ala Lys Asp Ala Ala Leu Gly Pro 
545 550 555 560 

He Arg Glu Leu Lys Pro Glu Gin Val Lys Asn Met Ser Ala Ser Glu 
565 570 575 

Met Arg Asn He Arg Leu Ser Asp Phe Thr Glu Ser Leu Lys Lys He 
580 585 590 

Lys Arg Ser Val Ser Pro Gin Thr Leu Glu Ala Tyr He Arg Trp Asn 
595 600 605 

Lys Asp Phe Gly Asp Thr Thr Val 
610 615 



<210> 4 
<211> 23 
<212> DNA 

<213> Artificial sequence 



<220> 

<223> Primer 



<400> 4 

cggagctcct cttggctgcc atg 

<210> 5 
<211> 26 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 5 

agaagcgctg gcagagccac acgaag 

<210> 6 
<211> 27 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 6 

aaggcgacca aacgcagcag cgcgaag 

<210> 7 
<211> 26 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 7 

aggagcaagc tgtggaatgg tataag 

<210> 8 
<211> 27 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 8 

tggttatggc caaggaccgc ttacaac 

<210> 9 
<211> 26 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 



<400> 9 

caaacggacg tctataatga cagtac 

<210> 10 
<211> 25 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 10 

ttaggaatgt ggacagcaac cttgc 

<210> 11 
<211> 25 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 11 

cttctctgag gcctgagttg ttcac 

<210> 12 
<211> 27 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 12 

tgctagaatg actgatggat actcagg 

<210> 13 
<211> 24 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 13 

agatgcagca ctgggtccta tccg 

<210> 14 
<211> 26 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 



<400> 14 



atgaacgtca tcggctacag aaacag 

<210> 15 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 15 

tagcagtggc tgccgccgt 

<210> 16 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 16 

aagcggtcct tggccataac 

<210> 17 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 17 

ggcggcagtg agagctgtg 

<210> 18 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 18 

ctagctcttt cacactgttc 

<210> 19 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 19 

aacaggcctt cgagtacatc 
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<210> 20 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 20 

ctgtgaacaa ctcaggcctc 

<210> 21 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 21 

atgagaaagc aggacagaag 

<210> 22 
<211> 18 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 22 

tgccaagtct tgaccagc 

<210> 23 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 23 

ctacaactgc tactcgtaag 

<210> 24 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 24 

cagtgctgca tcttttgcc 



<210> 25 
<211> 20 



<212> DNA 

<213> Artificial sequence 



<220> 

<223> Primer 
<400> 25 

taggaatgtg gacagcaacc 

<210> 26 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 26 

aaagctgtta ggtcacttcc 

<210> 27 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 27 

tggagatgac agagtacttg 

<210> 28 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 28 

ctggaatact ttcatctgc 

<210> 29 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 29 

atgaggctgt tctcaggcg 

<210> 30 
<211> 20 
<212> DNA 

<213> Artificial sequence 



<220> 

<223> Primer 
<400> 30 

gtgagccgaa ctgcacattg 

<210> 31 
<211> 21 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 31 

caaagtcgac agctacagtg c 

<210> 32 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 32 

ggaactgtag ttgagtggga 

<210> 33 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 33 

agatgaggct ccgacctac 

<210> 34 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 34 

aatgccacac ttgtaatctc 

<210> 35 
<211> 22 
<212> DNA 

<213> Artificial sequence 



<220> 



<223> Primer 
<400> 35 

tgtgaatata tcataatttg gg 

<210> 36 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 36 

tacagcagtt ctcatgatg 

<210> 37 
<211> 21 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 37 

gaccaaattg gtgcatgcat g 

<210> 38 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 38 

acatttccaa tacatcccac 

<210> 39 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<40O> 39 

atttgtcatt tcacatgcac 

<210> 40 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 



<400> 40 

ttagaatgac tatacctgac 



<210> 41 
<211> 18 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 41 

tcaggttaag taagactc 

<210> 42 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 42 

ttcctatcta cctagtgac 

<210> 43 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 43 

ttttatagca agttgccctg 

<210> 44 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 44 

cctatgaaga tcctggtac 

<210> 45 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 



<400> 45 

tgtcatgatt ctaacaaggg 



<210> 46 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 46 

tctatttcac tcctgacatg 

<210> 47 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 47 

gtcatagggc ttaggcttc 

<210> 48 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 48 

atcatactac ccacttttcc 

<210> 49 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<4O0> 49 

tgtttgggaa gatgctactg 

<210> 50 
<211> 21 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 50 

ctactgaaga taacgtacat g 



<210> 51 



<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 51 

cattgattgc catgtattgg 

<210> 52 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 52 

agaaggccag aaatactcag 

<210> 53 
<211> 22 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 53 

gtacttaaat cggtaaatat gg 

<210> 54 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<22 3> Primer 
<400> 54 

ctcaagtctt aggaatgcag 

<210> 55 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 

<400> 55 

gcacttaacc aggctgtatg 

<210> 56 
<211> 19 
<212> DNA 



<213> Artificial sequence 
<220> 

<223> Primer 
<400> 56 

ctcagatgac tcacatagc 

<210> 57 
<211> 22 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 57 

ctttactaga ctaattctcc tg 

<210> 58 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 58 

cagattcaag aagacagatc 

<210> 59 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 59 

gcaataattc accacacttg 

<210> 60 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 60 

ggtagttctt gtttctgctc 

<210> 61 
<211> 20 
<212> DNA 

<213> Artificial sequence 



<220> 

<223> Primer 



<400> 61 

caagtgtggt gaattattgc 

<210> 62 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 62 

gagctgaaaa gtattcagc 

<210> 63 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 63 

tgcaaaggac atagccagtg 

<210> 64 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 64 

agcctctgga gatagtatgc 

<210> 65 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 65 

ctagaacagg ggtcacagtc 

<210> 66 
<211> 18 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 



<400> 66 

ttggacttct taaacttc 

<210> 67 
<211> 21 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 67 

gcagtatgca agaaattgaa c 

<210> 68 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 68 

ggcctgtaat tttcttctg 

<210> 69 
<211> 21 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 69 

gtactgaata gatacatgta g 

<210> 70 
<211> 19 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 
<400> 70 

gtgtagcaga tcaacatag 

<210> 71 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 



<400> 71 
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catcttcaag tttggtgcac 20 

<210> 72 
<211> 1689 
<212> DNA 
<213> mouse 

<220> 

<223> Incomplete Spg4 DNA. 
<400> 72 

aggccgagag cgtccgcgtc ttccacaagc aggccttcga gtacatctcc attgccctgc 60 
gcatcgacga ggaagagaaa gcaggacaga aggaacaagc tgtggaatgg tataagaaag 120 
gtatcgaaga actggaaaaa ggaatcgctg ttatagttac gggccaaggt gaacagtatg 180 
aaagagctag acgtcttcaa gccaaaatga tgactaattt agttatggcc aaggaccgtt 240 
tacaacttct agagaagctg caaccagttt tgcaattttc caagtcacag acggacgtct 300 
ataacgagag tactaacctg acatgccgca atggacatct ccagtcagaa agtggagcag 360 
ttccgaagag gaaagacccc ttaacacatg ctagtaattc attgcctcga tcaaaaactg 420 
tcctgaaaag tggctccgca gggctctccg gtcaccacag ggcgcctagt tgcagtggtt 4 80 
tgtccatggt ttctggagca agaccgggac ctggtcctgc agctaccaca cataagggta 540 
ctccaaaacc aaatagaacc aacaaacctt ctactcccac aactgcagtt cggaaaaaga 600 
aagacttgaa aaattttagg aatgtggaca gcaatcttgc taaccttata atgaatgaaa 660 
ttgttgacaa tgggacagct gttaagtttg atgacatagc cgggcaggag ctggcaaagc 720 
aagcgctgca ggagattgtc atccttcctt ctctgcggcc tgagttgttc acagggctca 780 
gagctcctgc tagaggcttg ttactcttcg gtccgccagg aaacggaaaa acaatgctgg 840 
ctaaagcagt agctgcagag tctaatgcga cctttttcaa cataagtgct gccagtttaa 900 
cttcaaaata tgtgggagaa ggagagaaat tggtgagagc tctctttgct gtggctcgag 960 
aacttcaacc atctataatt tttatagatg aagttgacag tcttttgtgt gagagacggg 1020 
aaggggagca cgacgctagc agacggctaa agacggaatt tttaatagaa tttgacgggg 1080 
tgcaatctgc tggagatgac agagtacttg taatgggtgc aactaacagg ccccaagagc 1140 
ttgatgaagc tgttctcagg cgtttcatta aacgggtata tgtgtcctta ccaaatgagg 1200 
agacaagact ccttctgctt aaaaacctgt tgtgtaaaca aggaagtcca ctgacccaaa 1260 
aagaactcgc acagcttgct agaatgaccg atggatactc tggaagtgat ctgaccgctt 1320 
tggccaagga tgcagccctg ggtcctatcc gagaactgaa gccagagcag gtgaagaata 1380 
tgtctgccag tgagatgaga aatattcgat tatctgactt cacagaatcc ttaaaaaaga 1440 
taaaacgcag tgtgagtcct cagaccttag aagcatacat acgctggaac aaggattttg 1500 
gagacaccac tgtttaaagg aatggatgcc tctgtgagcc catagaacat cgcacttcac 1560 
aggaaacaag agctttggct acaggaaccc agacttcgtt tacaggacgt tttagagttt 1620 
tcatttttgt gcaccaaact tgaagaggaa caagaagaca gacctaaata aaatatgcaa 1680 
tatgaatgg 168 9 

<210> 73 
<211> 504 
<212> PRT 
<213> mouse 

<220> 

<223> Incomplete murine spastin. 
<400> 73 

Ala Glu Ser Val Arg Val Phe His Lys Gin Ala Phe Glu Tyr lie Ser 
15 10 15 

He Ala Leu Arg He Asp Glu Glu Glu Lys Ala Gly Gin Lys Glu Gin 
20 25 30 
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Ala Val Glu Trp Tyr Lys Lys Gly lie Glu Glu Leu Glu Lys Gly lie 
35 40 45 

Ala Val lie Val Thr Gly Gin Gly Glu Gin Tyr Glu Arg Ala Arg Arg 
50 55 60 

Leu Gin Ala Lys Met Met Thr Asn Leu Val Met Ala Lys Asp Arg Leu 
65 70 75 80 

Gin Leu Leu Glu Lys Leu Gin Pro Val Leu Gin Phe Ser Lys Ser Gin 
85 90 95 

Thr Asp Val Tyr Asn Glu Ser Thr Asn Leu Thr Cys Arg Asn Gly His 
100 105 110 

Leu Gin Ser Glu Ser Gly Ala Val Pro Lys Arg Lys Asp Pro Leu Thr 
115 120 125 

His Ala Ser Asn Ser Leu Pro Arg Ser Lys Thr Val Leu Lys Ser Gly 
130 135 140 

Ser Ala Gly Leu Ser Gly His His Arg Ala Pro Ser Cys Ser Gly Leu 
145 150 155 160 

Ser Met Val Ser Gly Ala Arg Pro Gly Pro Gly Pro Ala Ala Thr Thr 
165 170 175. 

His Lys Gly Thr Pro Lys Pro Asn Arg Thr Asn Lys Pro Ser Thr Pro 
180 185 190 

Thr Thr Ala Val Arg Lys Lys Lys Asp Leu Lys Asn Phe Arg Asn Val 
195 200 205 

Asp Ser Asn Leu Ala Asn Leu lie Met Asn Glu lie Val Asp Asn Gly 
210 215 220 

Thr Ala Val Lys Phe Asp Asp lie Ala Gly Gin Glu Leu Ala Lys Gin 
225 230 235 240 

Ala Leu Gin Glu lie Val lie Leu Pro Ser Leu Arg Pro Glu Leu Phe 
245 250 255 

Thr Gly Leu Arg Ala Pro Ala Arg Gly Leu Leu Leu Phe Gly Pro Pro 
260 265 270 

Gly Asn Gly Lys Thr Met Leu Ala Lys Ala Val Ala Ala Glu Ser Asn 
275 280 285 

Ala Thr Phe Phe Asn lie Ser Ala Ala Ser Leu Thr Ser Lys Tyr Val 
290 295 300 

Gly Glu Gly Glu Lys Leu Val Arg Ala Leu Phe Ala Val Ala Arg Glu 
305 310 315 320 



Leu Gin Pro Ser lie lie Phe lie Asp Glu Val Asp Ser Leu Leu Cys 
325 330 335 
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Glu Arg Arg Glu Gly Glu His Asp Ala Ser Arg Arg Leu Lys Thr Glu 
340 345 350 

Phe Leu lie Glu Phe Asp Gly Val Gin Ser Ala Gly Asp Asp Arg Val 
355 360 365 

Leu Val Met Gly Ala Thr Asn Arg Pro Gin Glu Leu Asp Glu Ala Val 
370 375 380 

Leu Arg Arg Phe lie Lys Arg Val Tyr Val Ser Leu Pro Asn Glu Glu 
385 390 395 400 

Thr Arg Leu Leu Leu Leu Lys Asn Leu Leu Cys Lys Gin Gly Ser Pro 
405 410 415 

Leu Thr Gin Lys Glu Leu Ala Gin Leu Ala Arg Met Thr Asp Gly Tyr 
420 425 430 

Ser Gly Ser Asp Leu Thr Ala Leu Ala Lys Asp Ala Ala Leu Gly Pro 
435 440 445 

lie Arg Glu Leu Lys Pro Glu Gin Val Lys Asn Met Ser Ala Ser Glu 
450 455 460 

Met Arg Asn lie Arg Leu Ser Asp Phe Thr Glu Ser Leu Lys Lys lie 
465 470 475 480 

Lys Arg Ser Val Ser Pro Gin Thr Leu Glu Ala Tyr He Arg Trp Asn 
485 490 495 

Lys Asp Phe Gly Asp Thr Thr Val 
500 



<210> 74 
<211> 24 
<212> DNA 

<213> Homo sapiens 
<220> 

<223> SPG4 gene splice acceptor site. 
<400> 74 

attttttatt ttaaagcagg acag 



<210> 75 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site. 



<400> 75 

aatttttttc tttcaggtga acag 



<210> 76 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site 



<400> 76 

cttctctgtt gcatagagaa gatg 



<210> 77 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site 
<400> 77 

actttttcct tgtcagaaag tgga 



<210> 78 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site 
<400> 78 

ttttgtatcc tttaagggta ctcc 



<210> 79 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site 
<400> 79 

aggtcttgtt tcttagtgga acag 



<210> 80 

<211> 24 

<212> DNA 

<213> Homo sapiens 
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<220> 

<223> SPG4 gene splice acceptor site. 
<400> 80 

agtatatatt ttttagttgt tcac 



<210> 81 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site. 
<400> 81 

cttgtgattt ttaaaggcta aagc 24 



<210> 82 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site. 
<400> 82 

taatgctttg ttttaggtgg gaga 24 



<210> 83 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site. 
<400> 83 

cttgtatttc ctctagatga agtt 24 



<210> 84 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site. 
<400> 84 

gattttttgc ttgtaggtac agtc 24 



<210> 85 
<211> 24 
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<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice acceptor site. 
<400> 85 

ggattttttt ttttaggcgt ttca 24 



<210> 86 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice acceptor site. 
<400> 86 

ttttaatatt tttcagacaa gact 



<210> 87 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice acceptor site. 
<400> 87 

tccttccctt cctcagaatg actg 



<210> 88 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice acceptor site. 
<400> 88 

cttttatgtt ttacagaact aaaa 



<210> 89 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice acceptor site. 



<400> 89 

ctttttaaaa atctagatga gaaa 



24 
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<210> 90 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice donor site. 
<400> 90 

tgagaaaggt aactaggggg ctgg 24 



<210> 91 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice donor site. 
<400> 91 

aggacaaggt aagattgtat ttgt 24 



<210> 92 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice donor site. 
<400> 92 

acttctaggt atcaattaat gtat 24 



<210> 93 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice donor site. 
<400> 93 

ccagtcaggt gggtttaggt taac 24 



<210> 94 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice donor site. 
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<400> 94 

ctcataaggt attctgggac agta 24 



<210> 95 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice donor site. 



<400> 95 

gtggacaagt aagttttgcc atct 



<210> 96 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site. 
<400> 96 

ggcctgaggt aagaacttta tatt 

<210> 97 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site. 
<400> 97 

caatgctggt aagggttctc ttca 



<210> 98 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site. 
<400> 98 

caaaatacgt gagtgctctg tttc 



<210> 99 

<211> 24 

<212> DNA 

<213> Homo sapiens 



<220> 

<223> SPG4 gene splice donor site 
<400> 99 

ttttataggt aagaacatat tttc 

<210> 100 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site 
<400> 100 

ttgatggtgt aagtgttgat tatg 



<210> 101 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site 
<400> 101 

gttctcaggt agggagattt atat 

<210> 102 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site 
<400> 102 

atgaggaggt atgtatctgt gttt 



<210> 103 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site 
<400> 103 

cttgctaggt gagtaatttg gatt 

<210> 104 

<211> 24 

<212> DNA 

<213> Homo sapiens 
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<220> 

<223> SPG4 gene splice donor site. 
<400> 104 

tatccgaggt aggtatacaa gage 24 



<210> 105 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> SPG4 gene splice donor site. 
<400> 105 

ccagtgaggt atagtatttt acaa 24 



<210> 106 

<211> 2152 

<212> DNA 

<213> Mus musculus 

<220> 

<223> Complete Spg4 DNA 
<400> 106 

gctcccgggc tccggcgggc gegegagegg ctccgtgggc ccccgccgcc gcagtggcag 60 
tggccgccgc cgccgcttgg tcgccgtcgg tetgegggaa gcgggttatg geggeggegg 120 
cagtgggagc tgtgaatgag ttctccggcc ggacgacgga agaagaaagg ctcgggcggc 180 
gcgagcccgg cgcccgccag gcctccgccc cccgccgcgg tccccgcccc tgccgccggc 24 0 
ccggcccctg cggccggctc gccgcctaag eggaacctgt cttctttctc gtccccgctg 300 
gtegtegget tcgccctgct gcgcctgctg gcctgccacc tggggctcct cttcgcgtgg 360 
ctctgccagc gcttctcccg cgccctcatg gccgccaaga ggagctcegg gaccgcgccg 420 
gcgcccgcct cgccctcgcc cccagcgccc ggaccgggtg gegaggcega gagcgtccgc 480 
gtcttccaca ageaggcett cgagtacatc tccattgccc tgegcatega cgaggaagag 540 
aaagcaggac agaaggaaca agctgtggaa tggtataaga aaggtatcga agaactggaa 600 
aaaggaatcg ctgttatagt tacgggccaa ggtgaacagt atgaaagagc tagaegtett 660 
caagecaaaa tgatgactaa tttagttatg gccaaggacc gtttacaact tctagagaag 720 
ctgcaaccag ttttgeaatt ttccaagtca cagaeggacg tctataacga gagtactaac 780 

ctgacatgcc gcaatggaca tctccagtca gaaagtggag cagttccgaa gaggaaagac 840 

cccttaacac atgctagtaa ttcattgect cgatcaaaaa ctgtcctgaa aagtggctcc 900 

gcagggctct ccggtcacca cagggcgcct agttgcagtg gtttgtccat ggtttctgga 960 

geaagacegg gacctggtcc tgcagctacc acacataagg gtactccaaa accaaataga 1020 

accaacaaac cttctactcc cacaactgca gttcggaaaa agaaagactt gaaaaatttt 1080 

aggaatgtgg acagcaatct tgetaacett ataatgaatg aaattgttga caatgggaca 1140 

gctgttaagt ttgatgacat ageegggcag gagctggcaa ageaageget gcaggagatt 1200 

gtcatccttc cttctctgcg gcctgagttg ttcacagggc tcagagctcc tgetagagge 1260 

ttgttactct tcggtccgcc aggaaaegga aaaacaatgc tggctaaagc agtagctgea 1320 

gagtctaatg cgaccttttt caacataagt getgecagtt taacttcaaa atatgtggga 1380 

gaaggagaga aattggtgag agctctcttt gctgtggctc gagaacttca accatctata 1440 

atttttatag atgaagttga cagtcttttg tgtgagagac gggaagggga gcacgacgct 1500 

ageagaegge taaagacgga atttttaata gaatttgacg gggtgcaatc tgctggagat 1560 

gacagagtac ttgtaatggg tgcaactaac aggccccaag agcttgatga agctgttctc 1620 
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aggcgtttca ttaaacgggt atatgtgtcc ttaccaaatg aggagacaag actccttctg 1680 
cttaaaaacc tgttgtgtaa acaaggaagt ccactgaccc aaaaagaact cgcacagctt 1740 
gctagaatga ccgatggata ctctggaagt gatctgaccg ctttggccaa ggatgcagcc 1800 
ctgggtccta tccgagaact gaagccagag caggtgaaga atatgtctgc cagtgagatg 1860 
agaaatattc gattatctga cttcacagaa tccttaaaaa agataaaacg cagtgtgagt 1920 
cctcagacct tagaagcata catacgctgg aacaaggatt ttggagacac cactgtttaa 1980 
aggaatggat gcctctgtga gcccatagaa catcgcactt cacaggaaac aagagctttg 2040 
gctacaggaa cccagacttc gtttacagga cgttttagag ttttcatttt tgtgcaccaa 2100 
acttgaagag gaacaagaag acagacctaa ataaaatatg caatatgaat gg 2152 



<210> 107 
<211> 614 
<212> PRT 

<213> Mus musculus 
<220> 

<223> Complete murine spastin 
<400> 107 

Met Ser Ser Pro Ala Gly Arg Arg Lys Lys Lys Gly Ser Gly Gly Ala 
15 10 15 



Ser Pro Ala Pro Ala Arg Pro Pro 
20 

Ala Ala Gly Pro Ala Pro Ala Ala 
35 40 

Ser Ser Phe Ser Ser Pro Leu Val 
50 55 



Pro Pro Ala Ala Val Pro Ala Pro 
25 30 

Gly Ser Pro Pro Lys Arg Asn Leu 
45 

Val Gly Phe Ala Leu Leu Arg Leu 
60 



Leu Ala Cys His Leu Gly Leu Leu Phe Ala Trp Leu Cys Gin Arg Phe 
65 70 75 80 

Ser Arg Ala Leu Met Ala Ala Lys Arg Ser Ser Gly Thr Ala Pro Ala 
85 90 95 

Pro Ala Ser Pro Ser Pro Pro Ala Pro Gly Pro Gly Gly Glu Ala Glu 
100 105 110 

Ser Val Arg Val Phe His Lys Gin Ala Phe Glu Tyr lie Ser lie Ala 
115 120 125 

Leu Arg lie Asp Glu Glu Glu Lys Ala Gly Gin Lys Glu Gin Ala Val 
130 135 140 

Glu Trp Tyr Lys Lys Gly lie Glu Glu Leu Glu Lys Gly lie Ala Val 
145 150 155 160 

lie Val Thr Gly Gin Gly Glu Gin Tyr Glu Arg Ala Arg Arg Leu Gin 
165 170 175 

Ala Lys Met Met Thr Asn Leu Val Met Ala Lys Asp Arg Leu Gin Leu 
180 185 190 



Leu Glu Lys Leu Gin Pro Val Leu Gin Phe Ser Lys Ser Gin Thr Asp 
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Val Tyr Asn Glu Ser Thr Asn Leu Thr Cys Arg Asn Gly His Leu Gin 
210 215 220 

Ser Glu Ser Gly Ala Val Pro Lys Arg Lys Asp Pro Leu Thr His Ala 
225 230 235 240 

Ser Asn Ser Leu Pro Arg Ser Lys Thr Val Leu Lys Ser Gly Ser Ala 
245 250 255 

Gly Leu Ser Gly His His Arg Ala Pro Ser Cys Ser Gly Leu Ser Met 
260 265 270 

Val Ser Gly Ala Arg Pro Gly Pro Gly Pro Ala Ala Thr Thr His Lys 
275 280 285 

Gly Thr Pro Lys Pro Asn Arg Thr Asn Lys Pro Ser Thr Pro Thr Thr 
290 295 300 

Ala Val Arg Lys Lys Lys Asp Leu Lys Asn Phe Arg Asn Val Asp Ser 
305 310 315 320 

Asn Leu Ala Asn Leu lie Met Asn Glu lie Val Asp Asn Gly Thr Ala 
325 330 335 

Val Lys Phe Asp Asp lie Ala Gly Gin Glu Leu Ala Lys Gin Ala Leu 
340 345 350 

Gin Glu lie Val lie Leu Pro Ser Leu Arg Pro Glu Leu Phe Thr Gly 
355 360 365 

Leu Arg Ala Pro Ala Arg Gly Leu Leu Leu Phe Gly Pro Pro Gly Asn 
370 375 380 

Gly Lys Thr Met Leu Ala Lys Ala Val Ala Ala Glu Ser Asn Ala Thr 
385 390 395 400 

Phe Phe Asn He Ser Ala Ala Ser Leu Thr Ser Lys Tyr Val Gly Glu 
405 410 415 

Gly Glu Lys Leu Val Arg Ala Leu Phe Ala Val Ala Arg Glu Leu Gin 
420 425 430 

Pro Ser He He Phe He Asp Glu Val Asp Ser Leu Leu Cys Glu Arg 
435 440 445 

Arg Glu Gly Glu His Asp Ala Ser Arg Arg Leu Lys Thr Glu Phe Leu 
450 455 460 

He Glu Phe Asp Gly Val Gin Ser Ala Gly Asp Asp Arg Val Leu Val 
465 470 475 480 

Met Gly Ala Thr Asn Arg Pro Gin Glu Leu Asp Glu Ala Val Leu Arg 
485 490 495 



Arg Phe He Lys Arg Val Tyr Val Ser Leu Pro Asn Glu Glu Thr Arg 
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500 

Leu Leu Leu Leu 
515 

Gin Lys Glu Leu 
530 

Ser Asp Leu Thr 
545 

Glu Leu Lys Pro 



Asn lie Arg Leu 
580 

Ser Val Ser Pro 
5 95 



Lys Asn Leu Leu 
520 

Ala Gin Leu Ala 
535 

Ala Leu Ala Lys 
550 

Glu Gin Val Lys 
565 

Ser Asp Phe Thr 



Gin Thr Leu Glu 
600 



505 

Cys Lys Gin Gly 



Arg Met Thr Asp 
540 

Asp Ala Ala Leu 
555 

Asn Met Ser Ala 
570 

Glu Ser Leu Lys 
585 

Ala Tyr lie Arg 



510 

Ser Pro Leu Thr 
525 

Gly Tyr Ser Gly 



Gly Pro lie Arg 
560 

Ser Glu Met Arg 
575 

Lys lie Lys Arg 
590 

Trp Asn Lys Asp 
605 



Phe Gly Asp Thr Thr Val 
610 



